From 95c08cc7dcf1ce589c1c8f1b229fc335b6f97662 Mon Sep 17 00:00:00 2001 From: yangdx Date: Wed, 3 Sep 2025 12:35:52 +0800 Subject: [PATCH] Improve entity extraction prompt clarity by replacing pronouns with specific nouns --- lightrag/prompt.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lightrag/prompt.py b/lightrag/prompt.py index 0ee5ec20..76077af0 100644 --- a/lightrag/prompt.py +++ b/lightrag/prompt.py @@ -17,7 +17,7 @@ Given a text document and a list of entity types, identify all entities of those 1. Recognizing definitively conceptualized entities in text. For each identified entity, extract the following information: - entity_name: Name of the entity, use same language as input text. If English, capitalized the name - entity_type: Categorize the entity using the provided `Entity_types` list. If a suitable category cannot be determined, classify it as "Other". - - entity_description: Provide a comprehensive description of the entity's attributes and activities based on the information present in the input text. Do not add external knowledge. + - entity_description: Provide a comprehensive description of the entity's attributes and activities based on the information present in the input text. To ensure clarity and precision, all descriptions must replace pronouns and referential terms (e.g., "this document," "our company," "I," "you," "he/she") with the specific nouns they represent. 2. Format each entity as: ("entity"{tuple_delimiter}{tuple_delimiter}{tuple_delimiter}) 3. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are directly and clearly related based on the text. Unsubstantiated relationships must be excluded from the output. For each pair of related entities, extract the following information: @@ -33,6 +33,7 @@ For each pair of related entities, extract the following information: ---Quality Guidelines--- - Only extract entities that are clearly defined and meaningful in the context - Avoid over-interpretation; stick to what is explicitly stated in the text +- For all output content, explicitly name the subject or object rather than using pronouns - Include specific numerical data in entity name when relevant - Ensure entity names are consistent throughout the extraction