LightRAG/lightrag/prompt.py.bak
clssck 59e89772de refactor: consolidate to PostgreSQL-only backend and modernize stack
Remove legacy storage implementations and deprecated examples:
- Delete FAISS, JSON, Memgraph, Milvus, MongoDB, Nano Vector DB, Neo4j, NetworkX, Qdrant, Redis storage backends
- Remove Kubernetes deployment manifests and installation scripts
- Delete unofficial examples for deprecated backends and offline deployment docs
Streamline core infrastructure:
- Consolidate storage layer to PostgreSQL-only implementation
- Add full-text search caching with FTS cache module
- Implement metrics collection and monitoring pipeline
- Add explain and metrics API routes
Modernize frontend and tooling:
- Switch web UI to Bun with bun.lock, remove npm and pnpm lockfiles
- Update Dockerfile for PostgreSQL-only deployment
- Add Makefile for common development tasks
- Update environment and configuration examples
Enhance evaluation and testing capabilities:
- Add prompt optimization with DSPy and auto-tuning
- Implement ground truth regeneration and variant testing
- Add prompt debugging and response comparison utilities
- Expand test coverage with new integration scenarios
Simplify dependencies and configuration:
- Remove offline-specific requirement files
- Update pyproject.toml with streamlined dependencies
- Add Python version pinning with .python-version
- Create project guidelines in CLAUDE.md and AGENTS.md
2025-12-12 16:28:49 +01:00

341 lines
15 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

from __future__ import annotations
from typing import Any
PROMPTS: dict[str, Any] = {}
# All delimiters must be formatted as "<|TOKEN|>" style markers (e.g., "<|#|>" or "<|COMPLETE|>")
PROMPTS['DEFAULT_TUPLE_DELIMITER'] = '<|#|>'
PROMPTS['DEFAULT_COMPLETION_DELIMITER'] = '<|COMPLETE|>'
PROMPTS['entity_extraction_system_prompt'] = """---Role---
You are a Knowledge Graph Specialist extracting entities and relationships from text.
---Output Format---
Output raw lines only—NO markdown, NO headers, NO backticks.
Entity: entity{tuple_delimiter}name{tuple_delimiter}type{tuple_delimiter}description
Relation: relation{tuple_delimiter}source{tuple_delimiter}target{tuple_delimiter}keywords{tuple_delimiter}description
Use Title Case for names. Separate keywords with commas. Output entities first, then relations. End with {completion_delimiter}.
---Entity Extraction---
Extract BOTH concrete and abstract entities:
- **Concrete:** Named people, organizations, places, products, dates
- **Abstract:** Concepts, events, categories, processes mentioned in text (e.g., "market selloff", "merger", "pandemic")
Types: `{entity_types}` (use `Other` if none fit)
---Relationship Extraction---
Extract meaningful relationships:
- **Direct:** explicit interactions, actions, connections
- **Categorical:** entities sharing group membership or classification
- **Causal:** cause-effect relationships
- **Hierarchical:** part-of, member-of, type-of
Create intermediate concept entities when they help connect related items (e.g., "Vaccines" connecting Pfizer/Moderna/AstraZeneca).
For N-ary relationships, decompose into binary pairs. Avoid duplicates.
---Guidelines---
- Third person only; no pronouns like "this article", "I", "you"
- Output in `{language}`. Keep proper nouns in original language.
---Examples---
{examples}
---Input---
Entity_types: [{entity_types}]
Text:
```
{input_text}
```
"""
PROMPTS['entity_extraction_user_prompt'] = """---Task---
Extract entities and relationships from the text. Include both concrete entities AND abstract concepts/events.
Follow format exactly. Output only extractions—no explanations. End with `{completion_delimiter}`.
Output in {language}; keep proper nouns in original language.
<Output>
"""
PROMPTS['entity_continue_extraction_user_prompt'] = """---Task---
Review extraction for missed entities/relationships.
Check for:
1. Abstract concepts that could serve as hubs (events, categories, processes)
2. Orphan entities that need connections
3. Formatting errors
Only output NEW or CORRECTED items. End with `{completion_delimiter}`. Output in {language}.
<Output>
"""
PROMPTS['entity_extraction_examples'] = [
# Example 1: Shows abstract concept extraction (Market Selloff as hub)
"""<Input Text>
```
Stock markets faced a sharp downturn as tech giants saw significant declines, with the global tech index dropping 3.4%. Nexon Technologies saw its stock plummet 7.8% after lower-than-expected earnings. In contrast, Omega Energy posted a 2.1% gain driven by rising oil prices.
Gold futures rose 1.5% to $2,080/oz as investors sought safe-haven assets. The Federal Reserve's upcoming policy announcement is expected to influence market stability.
```
<Output>
entity{tuple_delimiter}Market Selloff{tuple_delimiter}event{tuple_delimiter}Significant decline in stock values due to investor concerns.
entity{tuple_delimiter}Global Tech Index{tuple_delimiter}category{tuple_delimiter}Tracks major tech stocks; dropped 3.4% today.
entity{tuple_delimiter}Nexon Technologies{tuple_delimiter}organization{tuple_delimiter}Tech company whose stock fell 7.8% after disappointing earnings.
entity{tuple_delimiter}Omega Energy{tuple_delimiter}organization{tuple_delimiter}Energy company that gained 2.1% due to rising oil prices.
entity{tuple_delimiter}Gold Futures{tuple_delimiter}product{tuple_delimiter}Rose 1.5% to $2,080/oz as safe-haven investment.
entity{tuple_delimiter}Federal Reserve{tuple_delimiter}organization{tuple_delimiter}Central bank whose policy may impact markets.
relation{tuple_delimiter}Global Tech Index{tuple_delimiter}Market Selloff{tuple_delimiter}market decline{tuple_delimiter}Tech index drop is part of broader selloff.
relation{tuple_delimiter}Nexon Technologies{tuple_delimiter}Market Selloff{tuple_delimiter}tech decline{tuple_delimiter}Nexon among hardest hit in selloff.
relation{tuple_delimiter}Omega Energy{tuple_delimiter}Market Selloff{tuple_delimiter}contrast, resilience{tuple_delimiter}Omega gained while broader market sold off.
relation{tuple_delimiter}Gold Futures{tuple_delimiter}Market Selloff{tuple_delimiter}safe-haven{tuple_delimiter}Gold rose as investors fled stocks.
relation{tuple_delimiter}Federal Reserve{tuple_delimiter}Market Selloff{tuple_delimiter}policy impact{tuple_delimiter}Fed policy expectations contributed to volatility.
{completion_delimiter}
""",
# Example 2: Shows intermediate entity (Vaccines) connecting multiple orgs
"""<Input Text>
```
COVID-19 vaccines developed by Pfizer, Moderna, and AstraZeneca have shown high efficacy in preventing severe illness. The World Health Organization recommends vaccination for all eligible adults.
```
<Output>
entity{tuple_delimiter}COVID-19{tuple_delimiter}concept{tuple_delimiter}Disease that vaccines are designed to prevent.
entity{tuple_delimiter}Vaccines{tuple_delimiter}product{tuple_delimiter}Medical products developed to prevent COVID-19.
entity{tuple_delimiter}Pfizer{tuple_delimiter}organization{tuple_delimiter}Pharmaceutical company that developed a COVID-19 vaccine.
entity{tuple_delimiter}Moderna{tuple_delimiter}organization{tuple_delimiter}Pharmaceutical company that developed a COVID-19 vaccine.
entity{tuple_delimiter}AstraZeneca{tuple_delimiter}organization{tuple_delimiter}Pharmaceutical company that developed a COVID-19 vaccine.
entity{tuple_delimiter}World Health Organization{tuple_delimiter}organization{tuple_delimiter}Global health body recommending vaccination.
relation{tuple_delimiter}Pfizer{tuple_delimiter}Vaccines{tuple_delimiter}development{tuple_delimiter}Pfizer developed a COVID-19 vaccine.
relation{tuple_delimiter}Moderna{tuple_delimiter}Vaccines{tuple_delimiter}development{tuple_delimiter}Moderna developed a COVID-19 vaccine.
relation{tuple_delimiter}AstraZeneca{tuple_delimiter}Vaccines{tuple_delimiter}development{tuple_delimiter}AstraZeneca developed a COVID-19 vaccine.
relation{tuple_delimiter}Vaccines{tuple_delimiter}COVID-19{tuple_delimiter}prevention{tuple_delimiter}Vaccines prevent severe COVID-19 illness.
relation{tuple_delimiter}World Health Organization{tuple_delimiter}Vaccines{tuple_delimiter}recommendation{tuple_delimiter}WHO recommends vaccination for adults.
{completion_delimiter}
""",
# Example 3: Short legal example with hub entity (Merger)
"""<Input Text>
```
The merger between Acme Corp and Beta Industries requires Federal Trade Commission approval due to antitrust concerns.
```
<Output>
entity{tuple_delimiter}Merger{tuple_delimiter}event{tuple_delimiter}Proposed business combination between Acme Corp and Beta Industries.
entity{tuple_delimiter}Acme Corp{tuple_delimiter}organization{tuple_delimiter}Company involved in proposed merger.
entity{tuple_delimiter}Beta Industries{tuple_delimiter}organization{tuple_delimiter}Company involved in proposed merger.
entity{tuple_delimiter}Federal Trade Commission{tuple_delimiter}organization{tuple_delimiter}Regulatory body that must approve the merger.
relation{tuple_delimiter}Acme Corp{tuple_delimiter}Merger{tuple_delimiter}party to{tuple_delimiter}Acme Corp is party to the merger.
relation{tuple_delimiter}Beta Industries{tuple_delimiter}Merger{tuple_delimiter}party to{tuple_delimiter}Beta Industries is party to the merger.
relation{tuple_delimiter}Federal Trade Commission{tuple_delimiter}Merger{tuple_delimiter}regulatory approval{tuple_delimiter}FTC must approve the merger.
{completion_delimiter}
""",
]
PROMPTS['summarize_entity_descriptions'] = """---Task---
Merge multiple descriptions of "{description_name}" ({description_type}) into one comprehensive summary.
Rules:
- Plain text output only, no formatting or extra text
- Include ALL key facts from every description
- Third person, mention entity name at start
- Max {summary_length} tokens
- Output in {language}; keep proper nouns in original language
- If descriptions conflict: reconcile or note uncertainty
Descriptions:
```
{description_list}
```
Output:"""
PROMPTS['fail_response'] = "Sorry, I'm not able to provide an answer to that question.[no-context]"
# Default RAG response prompt - cite-ready (no LLM-generated citations)
# Citations are added by post-processing. This gives cleaner, more accurate results.
# Optimized via DSPy/RAGAS testing - qtype variant achieved 0.887 relevance, 0.996 faithfulness
PROMPTS['rag_response'] = """Context:
{context_data}
---
Answer using ONLY the context above. Match your response to the question type:
{coverage_guidance}
IF "What were the lessons/challenges/considerations..." → Enumerate: "(1)..., (2)..., (3)..."
IF "How does X describe/explain..." → Summarize what X says about the topic
IF "What are the relationships/interdependencies..." → Describe the connections
RULES:
- Every fact must be from the context above
- Use the question's terminology in your answer
- If information is missing, acknowledge it
Do NOT include [1], [2] citations - they're added automatically.
Question: {user_prompt}
Answer:"""
# Coverage guidance templates (injected based on context sparsity detection)
PROMPTS['coverage_guidance_limited'] = """
CONTEXT NOTICE: The retrieved information for this topic is LIMITED.
- Only state facts that appear explicitly in the context below
- If key aspects of the question aren't covered, acknowledge: "The available information does not specify [aspect]"
- Avoid inferring or generalizing beyond what's stated
"""
PROMPTS['coverage_guidance_good'] = '' # Empty for well-covered topics
# Strict mode suffix - append when response_type="strict"
PROMPTS['rag_response_strict_suffix'] = """
STRICT GROUNDING:
- NEVER state specific numbers/dates unless they appear EXACTLY in context
- If information isn't in context, say "not specified in available information"
- Entity summaries for overview, Source Excerpts for precision
"""
# Default naive RAG response prompt - cite-ready (no LLM-generated citations)
# Enhanced with strict grounding rules to prevent hallucination
PROMPTS['naive_rag_response'] = """---Task---
Answer the query using ONLY information present in the provided context. Do NOT add any external knowledge, assumptions, or inference beyond the exact wording.
STRICT GUIDELINES
- Every sentence must be a verbatim fact or a direct logical consequence that can be explicitly traced to a specific chunk of the context.
- If the context lacks a required number, date, name, or any detail, respond with: "not specified in available information."
- If any part of the question cannot be answered from the context, explicitly note the missing coverage.
- Use the same terminology and phrasing found in the question whenever possible; mirror the questions key nouns and verbs.
- When the answer contains multiple items, present them as a concise list.
FORMAT
- Match the language of the question.
- Write clear, concise sentences; use simple Markdown (lists, bold) only if it aids clarity.
- Do not include a References section; it will be generated automatically.
- Response type: {response_type}
{coverage_guidance}
Question: {user_prompt}
---Context---
{context_data}
{content_data}
Answer:"""
PROMPTS['kg_query_context'] = """
## Entity Summaries (use for definitions and general facts)
```json
{entities_str}
```
## Relationships (use to explain connections between concepts)
```json
{relations_str}
```
## Source Excerpts (use for specific facts, numbers, quotes)
```json
{text_chunks_str}
```
## References
{reference_list_str}
"""
PROMPTS['naive_query_context'] = """
Document Chunks (Each entry includes a reference_id that refers to the `Reference Document List`):
```json
{text_chunks_str}
```
Reference Document List (Each entry starts with a [reference_id] that corresponds to entries in the Document Chunks):
```
{reference_list_str}
```
"""
PROMPTS['keywords_extraction'] = """---Task---
Extract keywords from the query for RAG retrieval.
Output valid JSON (no markdown):
{{"high_level_keywords": [...], "low_level_keywords": [...]}}
Guidelines:
- high_level: Topic categories, question types, abstract themes
- low_level: Specific terms from the query including:
* Named entities (people, organizations, places)
* Technical terms and key concepts
* Dates, years, and time periods (e.g., "2017", "Q3 2024")
* Document names, report titles, and identifiers
- Extract at least 1 keyword per category for meaningful queries
- Only return empty lists for nonsensical input (e.g., "asdfgh", "hello")
---Examples---
{examples}
---Query---
{query}
Output:"""
PROMPTS['keywords_extraction_examples'] = [
"""Query: "What is the capital of France?"
Output: {{"high_level_keywords": ["Geography", "Capital city"], "low_level_keywords": ["France"]}}
""",
"""Query: "Why does inflation affect interest rates?"
Output: {{"high_level_keywords": ["Economics", "Cause-effect"], "low_level_keywords": ["inflation", "interest rates"]}}
""",
"""Query: "How does Python compare to JavaScript for web development?"
Output: {{"high_level_keywords": ["Programming languages", "Comparison"], "low_level_keywords": ["Python", "JavaScript"]}}
""",
]
PROMPTS['orphan_connection_validation'] = """---Task---
Evaluate if a meaningful relationship exists between two entities.
Orphan: {orphan_name} ({orphan_type}) - {orphan_description}
Candidate: {candidate_name} ({candidate_type}) - {candidate_description}
Similarity: {similarity_score}
Valid relationship types:
- Direct: One uses/creates/owns the other
- Industry: Both operate in same sector (finance, tech, healthcare)
- Competitive: Direct competitors or alternatives
- Temporal: Versions, successors, or historical connections
- Dependency: One relies on/runs on the other
Confidence levels (use these exact labels):
- HIGH: Direct/explicit relationship (Django is Python framework, iOS is Apple product)
- MEDIUM: Strong implicit or industry relationship (Netflix runs on AWS, Bitcoin and Visa both in payments)
- LOW: Very weak, tenuous connection
- NONE: No logical relationship
Output valid JSON:
{{"should_connect": bool, "confidence": "HIGH"|"MEDIUM"|"LOW"|"NONE", "relationship_type": str|null, "relationship_keywords": str|null, "relationship_description": str|null, "reasoning": str}}
Rules:
- HIGH/MEDIUM: should_connect=true (same industry = MEDIUM)
- LOW/NONE: should_connect=false
- High similarity alone is NOT sufficient
- Explain the specific relationship in reasoning
Example: Python↔Django
{{"should_connect": true, "confidence": "HIGH", "relationship_type": "direct", "relationship_keywords": "framework, built-with", "relationship_description": "Django is a web framework written in Python", "reasoning": "Direct explicit relationship - Django is implemented in Python"}}
Example: Mozart↔Docker
{{"should_connect": false, "confidence": "NONE", "relationship_type": null, "relationship_keywords": null, "relationship_description": null, "reasoning": "No logical connection between classical composer and container technology"}}
Output:"""