Format entire codebase with ruff and add type hints across all modules: - Apply ruff formatting to all Python files (121 files, 17K insertions) - Add type hints to function signatures throughout lightrag core and API - Update test suite with improved type annotations and docstrings - Add pyrightconfig.json for static type checking configuration - Create prompt_optimized.py and test_extraction_prompt_ab.py test files - Update ruff.toml and .gitignore for improved linting configuration - Standardize code style across examples, reproduce scripts, and utilities
464 lines
31 KiB
Python
464 lines
31 KiB
Python
from __future__ import annotations
|
|
|
|
from typing import Any
|
|
|
|
PROMPTS: dict[str, Any] = {}
|
|
|
|
# All delimiters must be formatted as "<|TOKEN|>" style markers (e.g., "<|#|>" or "<|COMPLETE|>")
|
|
PROMPTS['DEFAULT_TUPLE_DELIMITER'] = '<|#|>'
|
|
PROMPTS['DEFAULT_COMPLETION_DELIMITER'] = '<|COMPLETE|>'
|
|
|
|
PROMPTS['entity_extraction_system_prompt'] = """---Role---
|
|
You are a Knowledge Graph Specialist responsible for extracting entities and relationships from the input text.
|
|
|
|
---Instructions---
|
|
1. **Entity Extraction & Output:**
|
|
* **Identification:** Identify clearly defined and meaningful entities in the input text.
|
|
* **Entity Details:** For each identified entity, extract the following information:
|
|
* `entity_name`: The name of the entity. If the entity name is case-insensitive, capitalize the first letter of each significant word (title case). Ensure **consistent naming** across the entire extraction process.
|
|
* `entity_type`: Categorize the entity using one of the following types: `{entity_types}`. If none of the provided types apply, do not invent a new type; classify it as `Other`.
|
|
* `entity_description`: Provide a concise yet comprehensive description of the entity's attributes and activities, based *solely* on the information present in the input text.
|
|
* **Output Format - Entities:** Output a total of 4 fields for each entity, delimited by `{tuple_delimiter}`, on a single line. The first field *must* be the literal string `entity`.
|
|
* Format: `entity{tuple_delimiter}entity_name{tuple_delimiter}entity_type{tuple_delimiter}entity_description`
|
|
|
|
2. **Relationship Extraction & Output:**
|
|
* **Identification:** Identify meaningful relationships between previously extracted entities. Include:
|
|
* **Direct relationships:** Explicitly stated interactions, actions, or connections.
|
|
* **Comparative relationships:** When entities are explicitly grouped, ranked, or compared (e.g., 'Brazil has 5 wins, Germany has 4' establishes a comparison).
|
|
* **Hierarchical relationships:** Clear part-of, member-of, or type-of connections.
|
|
* **Causal relationships:** Explicit cause-effect connections.
|
|
* **Categorical relationships:** When entities share explicit group membership (e.g., 'World Cup winners include Brazil, Germany, and Italy').
|
|
* **AVOID** purely speculative or inferred connections not supported by text structure.
|
|
* **N-ary Relationship Decomposition:** If a single statement describes a relationship involving more than two entities (an N-ary relationship), decompose it into multiple binary (two-entity) relationship pairs for separate description.
|
|
* **Example:** For "Alice, Bob, and Carol collaborated on Project X," extract binary relationships such as "Alice collaborated with Project X," "Bob collaborated with Project X," and "Carol collaborated with Project X," or "Alice collaborated with Bob," based on the most reasonable binary interpretations.
|
|
* **Relationship Details:** For each binary relationship, extract the following fields:
|
|
* `source_entity`: The name of the source entity. Ensure **consistent naming** with entity extraction. Capitalize the first letter of each significant word (title case) if the name is case-insensitive.
|
|
* `target_entity`: The name of the target entity. Ensure **consistent naming** with entity extraction. Capitalize the first letter of each significant word (title case) if the name is case-insensitive.
|
|
* `relationship_keywords`: One or more high-level keywords summarizing the overarching nature, concepts, or themes of the relationship. Multiple keywords within this field must be separated by a comma `,`. **DO NOT use `{tuple_delimiter}` for separating multiple keywords within this field.**
|
|
* `relationship_description`: A concise explanation of the nature of the relationship between the source and target entities, providing a clear rationale for their connection.
|
|
* **Output Format - Relationships:** Output a total of 5 fields for each relationship, delimited by `{tuple_delimiter}`, on a single line. The first field *must* be the literal string `relation`.
|
|
* Format: `relation{tuple_delimiter}source_entity{tuple_delimiter}target_entity{tuple_delimiter}relationship_keywords{tuple_delimiter}relationship_description`
|
|
* **Relationship Quality Requirement:** Extract relationships that are supported by text structure. Balance precision with graph connectivity:
|
|
* Primary: Direct, causal, and hierarchical relationships with explicit textual support.
|
|
* Secondary: Comparative and categorical relationships when entities are grouped or ranked together.
|
|
* An orphan entity is acceptable only if the text truly provides no structural grouping.
|
|
* Do NOT invent speculative connections, but DO capture explicit groupings and comparisons.
|
|
* **Attribution Verification:** When extracting relationships, ensure the source and target entities are correctly identified from the text. Do not conflate similar entities or transfer attributes from one entity to another.
|
|
|
|
3. **Delimiter Usage Protocol:**
|
|
* The `{tuple_delimiter}` is a complete, atomic marker and **must not be filled with content**. It serves strictly as a field separator.
|
|
* **Incorrect Example:** `entity{tuple_delimiter}Tokyo<|location|>Tokyo is the capital of Japan.`
|
|
* **Correct Example:** `entity{tuple_delimiter}Tokyo{tuple_delimiter}location{tuple_delimiter}Tokyo is the capital of Japan.`
|
|
|
|
4. **Relationship Direction & Duplication:**
|
|
* Treat all relationships as **undirected** unless explicitly stated otherwise. Swapping the source and target entities for an undirected relationship does not constitute a new relationship.
|
|
* Avoid outputting duplicate relationships.
|
|
|
|
5. **Output Order & Prioritization:**
|
|
* Output all extracted entities first, followed by all extracted relationships.
|
|
* Within the list of relationships, prioritize and output those relationships that are **most significant** to the core meaning of the input text first.
|
|
|
|
6. **Context & Objectivity:**
|
|
* Ensure all entity names and descriptions are written in the **third person**.
|
|
* Explicitly name the subject or object; **avoid using pronouns** such as `this article`, `this paper`, `our company`, `I`, `you`, and `he/she`.
|
|
|
|
7. **Language & Proper Nouns:**
|
|
* The entire output (entity names, keywords, and descriptions) must be written in `{language}`.
|
|
* Proper nouns (e.g., personal names, place names, organization names) should be retained in their original language if a proper, widely accepted translation is not available or would cause ambiguity.
|
|
|
|
8. **Completion Signal:** Output the literal string `{completion_delimiter}` only after all entities and relationships, following all criteria, have been completely extracted and outputted.
|
|
|
|
---Examples---
|
|
{examples}
|
|
|
|
---Real Data to be Processed---
|
|
<Input>
|
|
Entity_types: [{entity_types}]
|
|
Text:
|
|
```
|
|
{input_text}
|
|
```
|
|
"""
|
|
|
|
PROMPTS['entity_extraction_user_prompt'] = """---Task---
|
|
Extract entities and relationships from the input text to be processed.
|
|
|
|
---Instructions---
|
|
1. **Strict Adherence to Format:** Strictly adhere to all format requirements for entity and relationship lists, including output order, field delimiters, and proper noun handling, as specified in the system prompt.
|
|
2. **Output Content Only:** Output *only* the extracted list of entities and relationships. Do not include any introductory or concluding remarks, explanations, or additional text before or after the list.
|
|
3. **Completion Signal:** Output `{completion_delimiter}` as the final line after all relevant entities and relationships have been extracted and presented.
|
|
4. **Output Language:** Ensure the output language is {language}. Proper nouns (e.g., personal names, place names, organization names) must be kept in their original language and not translated.
|
|
|
|
<Output>
|
|
"""
|
|
|
|
PROMPTS['entity_continue_extraction_user_prompt'] = """---Task---
|
|
Based on the last extraction task, identify and extract any **missed or incorrectly formatted** entities and relationships from the input text. Pay special attention to **orphan entities** (entities with no relationships).
|
|
|
|
---Instructions---
|
|
1. **Strict Adherence to System Format:** Strictly adhere to all format requirements for entity and relationship lists, including output order, field delimiters, and proper noun handling, as specified in the system instructions.
|
|
2. **Graph Connectivity Check:**
|
|
* Review extracted entities. For any without relationships, check if they appear in:
|
|
- Groupings or lists (categorical relationship candidates)
|
|
- Comparisons or rankings (comparative relationship candidates)
|
|
- Shared contexts with other entities (domain relationship candidates)
|
|
* Add these relationships if supported by text structure.
|
|
* An isolated entity is acceptable only if truly unconnected in the text.
|
|
3. **Focus on Corrections/Additions:**
|
|
* **Do NOT** re-output entities and relationships that were **correctly and fully** extracted in the last task.
|
|
* If an entity or relationship was **missed** in the last task, extract and output it now according to the system format.
|
|
* If an entity or relationship was **truncated, had missing fields, or was otherwise incorrectly formatted** in the last task, re-output the *corrected and complete* version in the specified format.
|
|
4. **Output Format - Entities:** Output a total of 4 fields for each entity, delimited by `{tuple_delimiter}`, on a single line. The first field *must* be the literal string `entity`.
|
|
5. **Output Format - Relationships:** Output a total of 5 fields for each relationship, delimited by `{tuple_delimiter}`, on a single line. The first field *must* be the literal string `relation`.
|
|
6. **Output Content Only:** Output *only* the extracted list of entities and relationships. Do not include any introductory or concluding remarks, explanations, or additional text before or after the list.
|
|
7. **Completion Signal:** Output `{completion_delimiter}` as the final line after all relevant missing or corrected entities and relationships have been extracted and presented.
|
|
8. **Output Language:** Ensure the output language is {language}. Proper nouns (e.g., personal names, place names, organization names) must be kept in their original language and not translated.
|
|
|
|
<Output>
|
|
"""
|
|
|
|
PROMPTS['entity_extraction_examples'] = [
|
|
"""<Input Text>
|
|
```
|
|
while Alex clenched his jaw, the buzz of frustration dull against the backdrop of Taylor's authoritarian certainty. It was this competitive undercurrent that kept him alert, the sense that his and Jordan's shared commitment to discovery was an unspoken rebellion against Cruz's narrowing vision of control and order.
|
|
|
|
Then Taylor did something unexpected. They paused beside Jordan and, for a moment, observed the device with something akin to reverence. "If this tech can be understood..." Taylor said, their voice quieter, "It could change the game for us. For all of us."
|
|
|
|
The underlying dismissal earlier seemed to falter, replaced by a glimpse of reluctant respect for the gravity of what lay in their hands. Jordan looked up, and for a fleeting heartbeat, their eyes locked with Taylor's, a wordless clash of wills softening into an uneasy truce.
|
|
|
|
It was a small transformation, barely perceptible, but one that Alex noted with an inward nod. They had all been brought here by different paths
|
|
```
|
|
|
|
<Output>
|
|
entity{tuple_delimiter}Alex{tuple_delimiter}person{tuple_delimiter}Alex is a character who experiences frustration and is observant of the dynamics among other characters.
|
|
entity{tuple_delimiter}Taylor{tuple_delimiter}person{tuple_delimiter}Taylor is portrayed with authoritarian certainty and shows a moment of reverence towards a device, indicating a change in perspective.
|
|
entity{tuple_delimiter}Jordan{tuple_delimiter}person{tuple_delimiter}Jordan shares a commitment to discovery and has a significant interaction with Taylor regarding a device.
|
|
entity{tuple_delimiter}Cruz{tuple_delimiter}person{tuple_delimiter}Cruz is associated with a vision of control and order, influencing the dynamics among other characters.
|
|
entity{tuple_delimiter}The Device{tuple_delimiter}equipment{tuple_delimiter}The Device is central to the story, with potential game-changing implications, and is revered by Taylor.
|
|
relation{tuple_delimiter}Alex{tuple_delimiter}Taylor{tuple_delimiter}power dynamics, observation{tuple_delimiter}Alex observes Taylor's authoritarian behavior and notes changes in Taylor's attitude toward the device.
|
|
relation{tuple_delimiter}Alex{tuple_delimiter}Jordan{tuple_delimiter}shared goals, rebellion{tuple_delimiter}Alex and Jordan share a commitment to discovery, which contrasts with Cruz's vision.
|
|
relation{tuple_delimiter}Taylor{tuple_delimiter}Jordan{tuple_delimiter}conflict resolution, mutual respect{tuple_delimiter}Taylor and Jordan interact directly regarding the device, leading to a moment of mutual respect and an uneasy truce.
|
|
relation{tuple_delimiter}Jordan{tuple_delimiter}Cruz{tuple_delimiter}ideological conflict, rebellion{tuple_delimiter}Jordan's commitment to discovery is in rebellion against Cruz's vision of control and order.
|
|
relation{tuple_delimiter}Taylor{tuple_delimiter}The Device{tuple_delimiter}reverence, technological significance{tuple_delimiter}Taylor shows reverence towards the device, indicating its importance and potential impact.
|
|
{completion_delimiter}
|
|
|
|
""",
|
|
"""<Input Text>
|
|
```
|
|
Stock markets faced a sharp downturn today as tech giants saw significant declines, with the global tech index dropping by 3.4% in midday trading. Analysts attribute the selloff to investor concerns over rising interest rates and regulatory uncertainty.
|
|
|
|
Among the hardest hit, nexon technologies saw its stock plummet by 7.8% after reporting lower-than-expected quarterly earnings. In contrast, Omega Energy posted a modest 2.1% gain, driven by rising oil prices.
|
|
|
|
Meanwhile, commodity markets reflected a mixed sentiment. Gold futures rose by 1.5%, reaching $2,080 per ounce, as investors sought safe-haven assets. Crude oil prices continued their rally, climbing to $87.60 per barrel, supported by supply constraints and strong demand.
|
|
|
|
Financial experts are closely watching the Federal Reserve's next move, as speculation grows over potential rate hikes. The upcoming policy announcement is expected to influence investor confidence and overall market stability.
|
|
```
|
|
|
|
<Output>
|
|
entity{tuple_delimiter}Global Tech Index{tuple_delimiter}category{tuple_delimiter}The Global Tech Index tracks the performance of major technology stocks and experienced a 3.4% decline today.
|
|
entity{tuple_delimiter}Nexon Technologies{tuple_delimiter}organization{tuple_delimiter}Nexon Technologies is a tech company that saw its stock decline by 7.8% after disappointing earnings.
|
|
entity{tuple_delimiter}Omega Energy{tuple_delimiter}organization{tuple_delimiter}Omega Energy is an energy company that gained 2.1% in stock value due to rising oil prices.
|
|
entity{tuple_delimiter}Gold Futures{tuple_delimiter}product{tuple_delimiter}Gold futures rose by 1.5%, indicating increased investor interest in safe-haven assets.
|
|
entity{tuple_delimiter}Crude Oil{tuple_delimiter}product{tuple_delimiter}Crude oil prices rose to $87.60 per barrel due to supply constraints and strong demand.
|
|
entity{tuple_delimiter}Market Selloff{tuple_delimiter}category{tuple_delimiter}Market selloff refers to the significant decline in stock values due to investor concerns over interest rates and regulations.
|
|
entity{tuple_delimiter}Federal Reserve Policy Announcement{tuple_delimiter}category{tuple_delimiter}The Federal Reserve's upcoming policy announcement is expected to impact investor confidence and market stability.
|
|
relation{tuple_delimiter}Global Tech Index{tuple_delimiter}Market Selloff{tuple_delimiter}market performance, investor sentiment{tuple_delimiter}The decline in the Global Tech Index is part of the broader market selloff driven by investor concerns.
|
|
relation{tuple_delimiter}Nexon Technologies{tuple_delimiter}Global Tech Index{tuple_delimiter}company impact, index movement{tuple_delimiter}Nexon Technologies' stock decline contributed to the overall drop in the Global Tech Index.
|
|
relation{tuple_delimiter}Nexon Technologies{tuple_delimiter}Market Selloff{tuple_delimiter}tech decline, earnings impact{tuple_delimiter}Nexon Technologies was among the hardest hit in the market selloff after disappointing earnings.
|
|
relation{tuple_delimiter}Omega Energy{tuple_delimiter}Crude Oil{tuple_delimiter}energy sector, price correlation{tuple_delimiter}Omega Energy's stock gain was driven by rising crude oil prices.
|
|
relation{tuple_delimiter}Omega Energy{tuple_delimiter}Market Selloff{tuple_delimiter}market contrast, energy resilience{tuple_delimiter}Omega Energy posted gains in contrast to the broader market selloff, showing energy sector resilience.
|
|
relation{tuple_delimiter}Crude Oil{tuple_delimiter}Market Selloff{tuple_delimiter}commodity rally, market divergence{tuple_delimiter}Crude oil prices rallied while stock markets experienced a selloff, reflecting divergent market dynamics.
|
|
relation{tuple_delimiter}Gold Futures{tuple_delimiter}Market Selloff{tuple_delimiter}market reaction, safe-haven investment{tuple_delimiter}Gold prices rose as investors sought safe-haven assets during the market selloff.
|
|
relation{tuple_delimiter}Federal Reserve Policy Announcement{tuple_delimiter}Market Selloff{tuple_delimiter}interest rate impact, financial regulation{tuple_delimiter}Speculation over Federal Reserve policy changes contributed to market volatility and investor selloff.
|
|
{completion_delimiter}
|
|
|
|
""",
|
|
"""<Input Text>
|
|
```
|
|
The patient presented with symptoms consistent with Type 2 diabetes, including elevated blood glucose levels and increased thirst. Dr. Martinez recommended starting metformin therapy alongside dietary modifications.
|
|
```
|
|
|
|
<Output>
|
|
entity{tuple_delimiter}Patient{tuple_delimiter}person{tuple_delimiter}The patient presented with symptoms of Type 2 diabetes including elevated blood glucose and increased thirst.
|
|
entity{tuple_delimiter}Type 2 Diabetes{tuple_delimiter}concept{tuple_delimiter}Type 2 diabetes is a metabolic condition characterized by elevated blood glucose levels.
|
|
entity{tuple_delimiter}Dr. Martinez{tuple_delimiter}person{tuple_delimiter}Dr. Martinez is the physician who recommended treatment for the patient.
|
|
entity{tuple_delimiter}Metformin{tuple_delimiter}product{tuple_delimiter}Metformin is a medication prescribed for managing Type 2 diabetes.
|
|
entity{tuple_delimiter}Dietary Modifications{tuple_delimiter}method{tuple_delimiter}Dietary modifications are lifestyle changes recommended alongside medication.
|
|
relation{tuple_delimiter}Patient{tuple_delimiter}Type 2 Diabetes{tuple_delimiter}diagnosis, medical condition{tuple_delimiter}The patient was diagnosed with symptoms consistent with Type 2 diabetes.
|
|
relation{tuple_delimiter}Dr. Martinez{tuple_delimiter}Patient{tuple_delimiter}treatment, medical care{tuple_delimiter}Dr. Martinez provided medical recommendations to the patient.
|
|
relation{tuple_delimiter}Metformin{tuple_delimiter}Type 2 Diabetes{tuple_delimiter}treatment, medication{tuple_delimiter}Metformin is prescribed as a treatment for Type 2 diabetes.
|
|
{completion_delimiter}
|
|
|
|
""",
|
|
"""<Input Text>
|
|
```
|
|
The merger between Acme Corp and Beta Industries requires approval from the Federal Trade Commission. Legal counsel advised that the deal may face antitrust scrutiny due to market concentration concerns.
|
|
```
|
|
|
|
<Output>
|
|
entity{tuple_delimiter}Acme Corp{tuple_delimiter}organization{tuple_delimiter}Acme Corp is a company involved in a proposed merger with Beta Industries.
|
|
entity{tuple_delimiter}Beta Industries{tuple_delimiter}organization{tuple_delimiter}Beta Industries is a company involved in a proposed merger with Acme Corp.
|
|
entity{tuple_delimiter}Federal Trade Commission{tuple_delimiter}organization{tuple_delimiter}The Federal Trade Commission is the regulatory body that must approve the merger.
|
|
entity{tuple_delimiter}Antitrust Scrutiny{tuple_delimiter}concept{tuple_delimiter}Antitrust scrutiny refers to regulatory review for market concentration concerns.
|
|
relation{tuple_delimiter}Acme Corp{tuple_delimiter}Beta Industries{tuple_delimiter}merger, business deal{tuple_delimiter}Acme Corp and Beta Industries are parties to a proposed merger.
|
|
relation{tuple_delimiter}Federal Trade Commission{tuple_delimiter}Acme Corp{tuple_delimiter}regulatory approval, oversight{tuple_delimiter}The FTC must approve the merger involving Acme Corp.
|
|
relation{tuple_delimiter}Antitrust Scrutiny{tuple_delimiter}Federal Trade Commission{tuple_delimiter}regulatory process, legal review{tuple_delimiter}Antitrust scrutiny is conducted by the FTC to assess market impact.
|
|
{completion_delimiter}
|
|
|
|
""",
|
|
]
|
|
|
|
PROMPTS['summarize_entity_descriptions'] = """---Role---
|
|
You are a Knowledge Graph Specialist, proficient in data curation and synthesis.
|
|
|
|
---Task---
|
|
Your task is to synthesize a list of descriptions of a given entity or relation into a single, comprehensive, and cohesive summary.
|
|
|
|
---Instructions---
|
|
1. Input Format: The description list is provided in JSON format. Each JSON object (representing a single description) appears on a new line within the `Description List` section.
|
|
2. Output Format: The merged description will be returned as plain text, presented in multiple paragraphs, without any additional formatting or extraneous comments before or after the summary.
|
|
3. Comprehensiveness: The summary must integrate all key information from *every* provided description. Do not omit any important facts or details.
|
|
4. Clarity: Write from an objective, third-person perspective and explicitly mention the full name of the entity or relation at the beginning for immediate context.
|
|
5. Conflict Handling:
|
|
- In cases of conflicting or inconsistent descriptions, first determine if these conflicts arise from multiple, distinct entities or relationships that share the same name.
|
|
- If distinct entities/relations are identified, summarize each one *separately* within the overall output.
|
|
- If conflicts within a single entity/relation (e.g., historical discrepancies) exist, attempt to reconcile them or present both viewpoints with noted uncertainty.
|
|
6. Length Constraint: The summary's total length must not exceed {summary_length} tokens while still maintaining depth and completeness.
|
|
7. Language: Write the entire output in {language}. Retain proper nouns (e.g., personal names, place names, organization names) in their original language if a clear, widely accepted translation is unavailable.
|
|
|
|
---Input---
|
|
{description_type} Name: {description_name}
|
|
|
|
Description List:
|
|
|
|
```
|
|
{description_list}
|
|
```
|
|
|
|
---Output---
|
|
"""
|
|
|
|
PROMPTS['fail_response'] = "Sorry, I'm not able to provide an answer to that question.[no-context]"
|
|
|
|
# Default RAG response prompt - cite-ready (no LLM-generated citations)
|
|
# Citations are added by post-processing. This gives cleaner, more accurate results.
|
|
PROMPTS[
|
|
'rag_response'
|
|
] = """You're helping someone understand a topic. Write naturally, like explaining to a curious friend.
|
|
|
|
Focus on directly answering the question asked. Include only information relevant to the query.
|
|
|
|
STYLE RULES:
|
|
- Flowing paragraphs, NOT bullets or numbered lists
|
|
- Connect sentences with transitions (however, this means, for example)
|
|
- Combine related facts into sentences rather than listing separately
|
|
- Vary sentence length - mix short and long
|
|
|
|
GOOD EXAMPLE:
|
|
"Machine learning is a branch of AI that enables computers to learn from data without explicit programming. The field includes several approaches: supervised learning uses labeled data, while unsupervised learning finds hidden patterns. Deep learning, using multi-layer neural networks, has proven especially effective for image recognition and language processing."
|
|
|
|
BAD EXAMPLE:
|
|
"- Machine learning: branch of AI
|
|
- Learns from data
|
|
- Types: supervised, unsupervised
|
|
- Deep learning uses neural networks"
|
|
|
|
Answer using ONLY the context below. Prefer information from the context over general knowledge.
|
|
Do NOT include [1], [2] citations - they're added automatically.
|
|
|
|
{user_prompt}
|
|
|
|
Context:
|
|
{context_data}
|
|
"""
|
|
|
|
# Strict mode suffix - append when response_type="strict"
|
|
PROMPTS['rag_response_strict_suffix'] = """
|
|
STRICT GROUNDING:
|
|
- NEVER state specific numbers/dates unless they appear EXACTLY in context
|
|
- If information isn't in context, say "not specified in available information"
|
|
- Entity summaries for overview, Source Excerpts for precision
|
|
"""
|
|
|
|
# Default naive RAG response prompt - cite-ready (no LLM-generated citations)
|
|
PROMPTS['naive_rag_response'] = """---Role---
|
|
|
|
You are an expert AI assistant synthesizing information from a knowledge base.
|
|
|
|
---Goal---
|
|
|
|
Generate a comprehensive, well-structured answer to the user query using ONLY information from the provided Document Chunks.
|
|
|
|
---Instructions---
|
|
|
|
1. **Cite-Ready Writing Style**:
|
|
- Write each factual claim as a distinct, complete sentence
|
|
- DO NOT include citation markers like [1], [2], or footnote references
|
|
- DO NOT add a References section - citations will be added automatically by the system
|
|
- Each sentence should be traceable to specific information in the context
|
|
|
|
2. **Content & Grounding**:
|
|
- Use ONLY information from the provided context
|
|
- DO NOT invent, assume, or infer any information not explicitly stated
|
|
- If the answer cannot be found in the context, state that clearly
|
|
- CRITICAL: Verify each fact appears EXACTLY in the provided context before stating it
|
|
|
|
3. **Formatting**:
|
|
- The response MUST be in the same language as the user query
|
|
- Use Markdown formatting for clarity (headings, bullet points, bold)
|
|
- The response should be presented in {response_type}
|
|
|
|
4. Additional Instructions: {user_prompt}
|
|
|
|
|
|
---Context---
|
|
|
|
{content_data}
|
|
"""
|
|
|
|
PROMPTS['kg_query_context'] = """
|
|
## Entity Summaries (use for definitions and general facts)
|
|
|
|
```json
|
|
{entities_str}
|
|
```
|
|
|
|
## Relationships (use to explain connections between concepts)
|
|
|
|
```json
|
|
{relations_str}
|
|
```
|
|
|
|
## Source Excerpts (use for specific facts, numbers, quotes)
|
|
|
|
```json
|
|
{text_chunks_str}
|
|
```
|
|
|
|
## References
|
|
{reference_list_str}
|
|
|
|
"""
|
|
|
|
PROMPTS['naive_query_context'] = """
|
|
Document Chunks (Each entry includes a reference_id that refers to the `Reference Document List`):
|
|
|
|
```json
|
|
{text_chunks_str}
|
|
```
|
|
|
|
Reference Document List (Each entry starts with a [reference_id] that corresponds to entries in the Document Chunks):
|
|
|
|
```
|
|
{reference_list_str}
|
|
```
|
|
|
|
"""
|
|
|
|
PROMPTS['keywords_extraction'] = """---Role---
|
|
You are an expert keyword extractor, specializing in analyzing user queries for a Retrieval-Augmented Generation (RAG) system. Your purpose is to identify both high-level and low-level keywords in the user's query that will be used for effective document retrieval.
|
|
|
|
---Goal---
|
|
Given a user query, your task is to extract two distinct types of keywords:
|
|
1. **high_level_keywords**: for overarching concepts or themes, capturing user's core intent, the subject area, or the type of question being asked.
|
|
2. **low_level_keywords**: for specific entities or details, identifying the specific entities, proper nouns, technical jargon, product names, or concrete items.
|
|
|
|
---Instructions & Constraints---
|
|
1. **Output Format**: Your output MUST be a valid JSON object and nothing else. Do not include any explanatory text, markdown code fences (like ```json), or any other text before or after the JSON. It will be parsed directly by a JSON parser.
|
|
2. **Source of Truth**: Derive all keywords explicitly from the user query. Populate both keyword lists when the query contains meaningful content; if the query is trivial or nonsensical, return empty lists (see edge cases).
|
|
3. **Concise & Meaningful**: Keywords should be concise words or meaningful phrases. Prioritize multi-word phrases when they represent a single concept. For example, from "latest financial report of Apple Inc.", you should extract "latest financial report" and "Apple Inc." rather than "latest", "financial", "report", and "Apple".
|
|
4. **Handle Edge Cases**: For queries that are too simple, vague, or nonsensical (e.g., "hello", "ok", "asdfghjkl"), you must return a JSON object with empty lists for both keyword types.
|
|
|
|
---Examples---
|
|
{examples}
|
|
|
|
---Real Data---
|
|
User Query: {query}
|
|
|
|
---Output---
|
|
Output:"""
|
|
|
|
PROMPTS['keywords_extraction_examples'] = [
|
|
"""Example 1:
|
|
|
|
Query: "How does international trade influence global economic stability?"
|
|
|
|
Output:
|
|
{
|
|
"high_level_keywords": ["International trade", "Global economic stability", "Economic impact"],
|
|
"low_level_keywords": ["Trade agreements", "Tariffs", "Currency exchange", "Imports", "Exports"]
|
|
}
|
|
|
|
""",
|
|
"""Example 2:
|
|
|
|
Query: "What are the environmental consequences of deforestation on biodiversity?"
|
|
|
|
Output:
|
|
{
|
|
"high_level_keywords": ["Environmental consequences", "Deforestation", "Biodiversity loss"],
|
|
"low_level_keywords": ["Species extinction", "Habitat destruction", "Carbon emissions", "Rainforest", "Ecosystem"]
|
|
}
|
|
|
|
""",
|
|
"""Example 3:
|
|
|
|
Query: "What is the role of education in reducing poverty?"
|
|
|
|
Output:
|
|
{
|
|
"high_level_keywords": ["Education", "Poverty reduction", "Socioeconomic development"],
|
|
"low_level_keywords": ["School access", "Literacy rates", "Job training", "Income inequality"]
|
|
}
|
|
|
|
""",
|
|
]
|
|
|
|
PROMPTS['orphan_connection_validation'] = """---Role---
|
|
You are a Knowledge Graph Quality Specialist. Your task is to evaluate whether a proposed relationship between two entities is meaningful and should be added to a knowledge graph.
|
|
|
|
---Context---
|
|
An orphan entity (entity with no connections) has been identified. Vector similarity search found a potentially related entity. You must determine if a genuine, meaningful relationship exists between them.
|
|
|
|
---Input---
|
|
**Orphan Entity:**
|
|
- Name: {orphan_name}
|
|
- Type: {orphan_type}
|
|
- Description: {orphan_description}
|
|
|
|
**Candidate Entity:**
|
|
- Name: {candidate_name}
|
|
- Type: {candidate_type}
|
|
- Description: {candidate_description}
|
|
|
|
**Vector Similarity Score:** {similarity_score}
|
|
|
|
---Instructions---
|
|
1. Analyze both entities carefully based on their names, types, and descriptions.
|
|
2. Determine if there is a genuine, meaningful relationship between them. Consider:
|
|
- Direct relationships (interaction, causation, membership)
|
|
- Categorical relationships (same domain, field, or category)
|
|
- Thematic relationships (shared concepts, contexts, or subject matter)
|
|
- Hierarchical relationships (part-of, type-of, related-to)
|
|
3. If a relationship exists, describe it and provide your confidence level.
|
|
4. If NO meaningful relationship exists, state this clearly. High vector similarity alone is NOT sufficient - entities must have a logical, describable connection.
|
|
|
|
---Output Format---
|
|
Your response MUST be a valid JSON object with exactly these fields:
|
|
{{
|
|
"should_connect": true/false,
|
|
"confidence": 0.0-1.0,
|
|
"relationship_type": "type of relationship or null",
|
|
"relationship_keywords": "comma-separated keywords or null",
|
|
"relationship_description": "description of the relationship or null",
|
|
"reasoning": "brief explanation of your decision"
|
|
}}
|
|
|
|
---Decision Guidelines---
|
|
- `should_connect: true` ONLY if you can articulate a clear, logical relationship
|
|
- `confidence >= 0.7` required for connection to be created
|
|
- High similarity + no logical connection = should_connect: false
|
|
- When in doubt, reject the connection (orphans are better than garbage connections)
|
|
|
|
---Output---
|
|
"""
|