fix: Remove Jon Doe enitity reference due to hallucination issues (#1939)

<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->

## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Clarifies and tightens coreference resolution guidance across
knowledge-graph prompt templates.
> 
> - Updates coreference rules to emphasize using the most complete,
human-readable identifiers consistently (`generate_graph_prompt*.txt`)
> - Tweaks examples, notably replacing the John Doe example with a
generic "X" case in the one-shot prompt
> - Minor wording/formatting cleanups; no code changes or logic
modifications
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
8499258272. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Refined entity resolution guidance in knowledge graph generation
prompts to use more generic instructions, improving flexibility and
consistency in how entities are identified throughout the system.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
This commit is contained in:
Vasilije 2026-01-10 08:49:23 +01:00 committed by GitHub
commit f03ab671e6
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 7 additions and 7 deletions

View file

@ -19,8 +19,8 @@ The aim is to achieve simplicity and clarity in the knowledge graph.
- **Naming Convention**: Use snake_case for relationship names, e.g., `acted_in`.
# 3. Coreference Resolution
- **Maintain Entity Consistency**: When extracting entities, it's vital to ensure consistency.
If an entity, such as "John Doe", is mentioned multiple times in the text but is referred to by different names or pronouns (e.g., "Joe", "he"),
always use the most complete identifier for that entity throughout the knowledge graph. In this example, use "John Doe" as the Persons ID.
If an entity, is mentioned multiple times in the text but is referred to by different names or pronouns,
always use the most complete identifier for that entity throughout the knowledge graph.
Remember, the knowledge graph should be coherent and easily understandable, so maintaining consistency in entity references is crucial.
# 4. Strict Compliance
Adhere to the rules strictly. Non-compliance will result in termination

View file

@ -22,7 +22,7 @@ You are an advanced algorithm designed to extract structured information to buil
3. **Coreference Resolution**:
- Maintain one consistent node ID for each real-world entity.
- Resolve aliases, acronyms, and pronouns to the most complete form.
- *Example*: Always use "John Doe" even if later referred to as "Doe" or "he".
- *Example*: Always use full identifier even if later referred to as in a similar but slightly different way
**Property & Data Guidelines**:

View file

@ -42,10 +42,10 @@ You are an advanced algorithm designed to extract structured information from un
- **Rule**: Resolve all aliases, acronyms, and pronouns to one canonical identifier.
> **One-Shot Example**:
> **Input**: "John Doe is an author. Later, Doe published a book. He is well-known."
> **Input**: "X is an author. Later, Doe published a book. He is well-known."
> **Output Node**:
> ```
> John Doe (Person)
> X (Person)
> ```
---

View file

@ -15,7 +15,7 @@ You are an advanced algorithm that extracts structured data into a knowledge gra
- Properties are key-value pairs; do not use escaped quotes.
3. **Coreference Resolution**
- Use a single, complete identifier for each entity (e.g., always "John Doe" not "Joe" or "he").
- Use a single, complete identifier for each entity
4. **Relationship Labels**:
- Use descriptive, lowercase, snake_case names for edges.

View file

@ -26,7 +26,7 @@ Use **basic atomic types** for node labels. Always prefer general types over spe
- Good: "Alan Turing", "Google Inc.", "World War II"
- Bad: "Entity_001", "1234", "he", "they"
- Never use numeric or autogenerated IDs.
- Prioritize **most complete form** of entity names for consistency (e.g., always use "John Doe" instead of "John" or "he").
- Prioritize **most complete form** of entity names for consistency
2. Dates, Numbers, and Properties
---------------------------------