Commit graph

123 commits

Author SHA1 Message Date
yangdx
294f75438e Restructure entity extraction prompt format for consistency
• Move entity_types to user prompt
• Add XML-style formatting tags
• Update examples with entity_types
2025-12-11 19:12:34 +08:00
Ghazi-raad
56677ae466
Update lightrag/prompt.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-26 23:18:12 +00:00
Ghazi-raad
207af40f54 Optimize for OpenAI Prompt Caching: Restructure entity extraction prompts
- Remove input_text from entity_extraction_system_prompt to enable caching
- Move input_text to entity_extraction_user_prompt for per-chunk variability
- Update operate.py to format system prompt once without input_text
- Format user prompts with input_text for each chunk

This enables OpenAI's automatic prompt caching (50% discount on cached tokens):
- ~1300 token system message cached and reused for ALL chunks
- Only ~150 token user message varies per chunk
- Expected 45% cost reduction on prompt tokens during indexing
- 2-3x faster response times from cached prompts

Fixes #2355
2025-11-26 21:56:25 +00:00
Daniel.y
d392db7b4a
Fix typo in 'equipment' in prompt.py 2025-10-22 11:13:22 +08:00
yangdx
6bf6f43d96 Remove bold formatting from instruction headers in prompts 2025-10-02 00:58:03 +08:00
yangdx
bb6138e748 fix(prompt): Clarify reference section restrictions in prompt templates 2025-10-01 22:35:26 +08:00
yangdx
37e8898cf6 Simplify reference formatting in LLM context generation
- Remove extra newlines in reference lists
- Change code block type from text to generic
2025-10-01 22:20:58 +08:00
yangdx
f83cde14df fix(prompt): Improve markdown formatting requirements and reference style 2025-10-01 21:41:12 +08:00
yangdx
0fd0186414 Improve prompt clarity by standardizing terminology and formatting
• Replace "Source Data" with "Context"
• Add bold formatting for key sections
• Clarify reference_id usage
• Improve JSON/text block formatting
• Standardize data source naming
2025-09-28 13:31:55 +08:00
yangdx
cbdc4c4bdf Refactor prompts and context building for better maintainability
- Extract context templates to PROMPTS
- Unify token calculation logic
- Simplify user_prompt formatting
- Reduce code duplication
- Improve prompt structure consistency
2025-09-26 12:39:06 +08:00
yangdx
fba2356c81 Move user_prompt to system prompt
- Refactor query prompt handling to separate user prompts in system context
- Simplify user_query to only contain query
- Apply changes to both kg_query and naive_query
2025-09-26 10:02:01 +08:00
yangdx
058ce83dba Clarify citation format and fix typo 2025-09-25 20:08:55 +08:00
yangdx
41a6da6786 Remove inline citation instructions from prompt templates
- Remove footnote syntax guidelines
- Delete inline citation examples
- Keep references section format
- Simplify citation documentation
- Update example section titles
2025-09-25 03:46:30 +08:00
yangdx
14bbafa146 Improve inline citation format and add examples to prompts
- Clarify single caret rule for citations
- Add citation format examples
- Rename to "References Section Format"
- Improve multi-citation instructions
2025-09-25 03:26:50 +08:00
yangdx
6177878812 Add inline citation format with footnote syntax to prompts
- Add footnote syntax `[^1]` for citations
- Support multiple citations `[^1,2,3]`
- Update reference section examples
- Enforce caret symbol requirement
- Match reference_id in brackets
2025-09-25 02:51:12 +08:00
yangdx
f610bd5d21 Update citation format to use bullet points and add examples
- Change citation format to `* [n]`
- Add reference section examples
- Apply to both prompt templates
- Improve formatting consistency
2025-09-24 21:59:21 +08:00
yangdx
e9503ee6ae Merge branch 'patch-1' into citation-optimization 2025-09-24 18:23:29 +08:00
yangdx
ac26f3a2f2 Refactor citation format from file paths to numbered document titles
• Change citation format to [n] style
• Reduce max citations from 6 to 5
• Add reference tracking instructions
• Simplify citation merge logic
• Remove inline citation requirements
2025-09-24 14:30:53 +08:00
SASon
b3cc0127d9
Fix typo in output language instruction 2025-09-24 13:22:35 +09:00
SASon
746d4c576d
Fix typo in output language instruction
from Oputput to Output
2025-09-24 13:17:37 +09:00
yangdx
5fa92cbf99 Improve citation quality and reduce reference limits in prompts
- Reduce max citations from 8 to 6
- Require direct fact referencing
- Clarify relevance prioritization
2025-09-22 10:53:03 +08:00
yangdx
8826d2f892 Optimize prompt instruction for citation format 2025-09-22 01:04:57 +08:00
yangdx
2f06f851c3 Enhance citation format with merged references and clearer guidelines
- Increase max references from 5 to 8
- Merge citations by file_path
- Remove inline citations from body
- Add reference section examples
- Update citation prefixes (KG→EN, RE)
2025-09-21 22:48:48 +08:00
yangdx
f88c2fbdff Refactor citation format instructions for clarity and consistency 2025-09-21 15:51:31 +08:00
yangdx
8f0fb3c9eb Include user query in prompt returns 2025-09-21 15:24:20 +08:00
yangdx
6eb37e270a Refactor query handling and improve RAG response prompts
- Move user_prompt to query concatenation
- Remove DEFAULT_USER_PROMPT constant
- Enhance prompt clarity and structure
- Standardize citation formatting
- Improve step-by-step instructions
2025-09-21 15:16:24 +08:00
yangdx
f69c5dfd9a Add language control and format clarity to extraction prompts 2025-09-14 18:26:41 +08:00
yangdx
6e37460964 Improve entity extraction prompt clarity and make sure LLM output content only 2025-09-14 17:50:56 +08:00
yangdx
4de1473875 Improve entity extraction prompts and error message formatting
• Fix typo in error log message
• Clarify format requirements in prompts
• Make extraction instructions clearer
• Improve user prompt consistency
2025-09-14 13:45:59 +08:00
yangdx
fd48afdb00 Use "relation" instead of "relationship" in extration prompt, and support both format for safty 2025-09-14 11:43:35 +08:00
yangdx
d993464a92 Restructure entity extraction prompt with clearer formatting and examples
* Improved instruction clarity
* Added better formatting structure
* Enhanced delimiter usage rules
* Clarified relationship handling
* Better third-person guidelines
2025-09-14 02:30:32 +08:00
yangdx
2686fc526e Change entity type from CreativeWork to Content and update delimiter
• Replace CreativeWork with Content type
• Improve LLM output error messages
• Update prompt for binary relationships
• Fix delimiter corruption examples
2025-09-14 00:55:15 +08:00
yangdx
4a5ab5121d Change delimiter from <|S|> to <|#|> and clarify formatting rules 2025-09-13 22:58:56 +08:00
yangdx
bf423a4ce1 Clarify output structure in prompt instructions by adding field count specifications 2025-09-13 09:51:33 +08:00
yangdx
369f799b16 Refine entity extraction prompts for clarity and consistency
• Clarify tuple delimiter usage
• Soften proper noun translation rules
• Standardize language requirements
• Improve output format consistency
2025-09-13 08:14:46 +08:00
yangdx
0221213b9b Improve entity summarization with JSONL format and fix tuple delimiters
• Convert descriptions to JSONL format
• Add token-based truncation helper
• Enhance entity name consistency rules
• Improve summarization prompt clarity
• Fix tuple delimiter corruption patterns
2025-09-12 12:32:08 +08:00
yangdx
1892ed23cc Change tuple delimiter from <|SEP|> to <|S|> across codebase
• Update prompt instruction clarity
• Correct utility function examples
• Update regex pattern comments
2025-09-12 08:57:46 +08:00
yangdx
b96f1484ec Shorten tuple delimiter to <|S|> and refine relationship extraction text
• Remove redundant "within input text"
• Clarify relationship extraction scope
2025-09-12 08:36:43 +08:00
yangdx
40688def20 Refactor tuple delimiter corruption fix into reusable utility function
- Extract regex fixes to utils module
- Add case-insensitive delimiter handling
2025-09-12 04:10:14 +08:00
yangdx
7f83a58497 Refactor extraction delimiters from ## to newlines and change tuple delimiter to <|SEP|>
• Add robust delimiter fixing logic
• Update prompts for single-line format
2025-09-11 13:44:44 +08:00
yangdx
02e7462645 feat: enhance LLM output format tolerance for bracket processing
- Expand bracket tolerance to support additional characters: < > " '
- Implement symmetric handling for both leading and trailing characters
- Replace simple string matching with robust regex-based pattern detection
- Maintain full backward compatibility with existing bracket formats
2025-09-10 18:10:06 +08:00
yangdx
50fddeebbf fix: Remove conversation history from prompt template
- Delete history section from prompt
- Simplify user query response format
- Remove {history} placeholder variable
2025-09-10 12:07:34 +08:00
yangdx
2dd143c935 Refactor conversation history handling to use LLM native message format
• Remove get_conversation_turns utility
• Pass history_messages to LLM directly
• Clean up prompt template formatting
2025-09-10 11:56:58 +08:00
yangdx
06db511f3b Remove angle brackets from entity and relationship output formats 2025-09-09 09:21:23 +08:00
yangdx
d218f15a62 Refactor entity extraction with system prompts and output limits
- Add system/user prompt separation
- Set max tokens for endless output fix
- Improve extraction error logging
- Update cache type from extract to summary
2025-09-08 15:20:45 +08:00
yangdx
725db3b240 Fix linting in the prompt 2025-09-06 11:16:49 +08:00
yangdx
219a08b7c9 Restore completion_delimiter 2025-09-06 11:13:37 +08:00
yangdx
528d04a0e4 Update prompt template delimiters 2025-09-06 10:35:06 +08:00
yangdx
5446815008 Refactor entity extraction prompts and remove completion delimiter.
- Remove `completion_delimiter` from prompts
- Update input/output format markers
2025-09-06 09:13:51 +08:00
yangdx
be3f0ebbe5 Simplify entity extraction prompt instructions and remove delimiter 2025-09-04 23:42:11 +08:00