yangdx
4de1473875
Improve entity extraction prompts and error message formatting
...
• Fix typo in error log message
• Clarify format requirements in prompts
• Make extraction instructions clearer
• Improve user prompt consistency
2025-09-14 13:45:59 +08:00
yangdx
70fee5bbeb
Fix syntax warning by removin examples from fix_tuple_delimiter_corruption docstring
2025-09-14 12:37:21 +08:00
yangdx
20c5127c7c
Merge branch 'optimize-extraction' into return-data-only
2025-09-14 12:33:37 +08:00
yangdx
619553021e
Fix delimiter processing and optimize case-sensitive handling
...
• Fix completion_delimiter reference bug
• Add case check before lowercase conversion
• Improve delimiter corruption handling
• Optimize redundant processing logic
2025-09-14 12:23:48 +08:00
yangdx
ff705a2323
Fix tuple delimiter corruption when missing closing bracket, Handle <|#: -> <|#|> pattern
2025-09-14 11:44:21 +08:00
yangdx
fd48afdb00
Use "relation" instead of "relationship" in extration prompt, and support both format for safty
2025-09-14 11:43:35 +08:00
yangdx
1dc96f3959
Merge branch 'optimize-extraction' into return-data-only
2025-09-14 05:37:48 +08:00
yangdx
b820d8d588
Fix entity/relationship record parsing in extraction result processing
2025-09-14 05:35:01 +08:00
yangdx
4f5ad76c2c
Add entity vector database upsert for newly added entities by edges upserts
2025-09-14 05:04:45 +08:00
yangdx
7cc2b69bcf
Fix linting
2025-09-14 05:02:02 +08:00
yangdx
cddd81a86c
Fix LLM output format errors in extraction result processing
...
- Handle tuple_delimiter as record separator
- Add format validation and correction
- Add warning for format errors
2025-09-14 04:13:01 +08:00
yangdx
419f4f0268
Update web assets
2025-09-14 02:31:42 +08:00
yangdx
d993464a92
Restructure entity extraction prompt with clearer formatting and examples
...
* Improved instruction clarity
* Added better formatting structure
* Enhanced delimiter usage rules
* Clarified relationship handling
* Better third-person guidelines
2025-09-14 02:30:32 +08:00
yangdx
5311083f43
Rename "Process" entity type to "Method" across all components
2025-09-14 02:30:05 +08:00
yangdx
7060cf17f0
Add Process and Data entity types to LLM extraction system
...
• Add Process and Data to default types
• Update env.example configuration
• Add translations for new entities
• Support 5 languages (en/zh/fr/ar/tw)
2025-09-14 01:14:47 +08:00
yangdx
2686fc526e
Change entity type from CreativeWork to Content and update delimiter
...
• Replace CreativeWork with Content type
• Improve LLM output error messages
• Update prompt for binary relationships
• Fix delimiter corruption examples
2025-09-14 00:55:15 +08:00
yangdx
4a5ab5121d
Change delimiter from <|S|> to <|#|> and clarify formatting rules
2025-09-13 22:58:56 +08:00
yangdx
244122094d
Merge branch 'optimize-extraction' into return-data-only
2025-09-13 15:38:50 +08:00
yangdx
41cdeaeaad
Add Concept and NaturalObject to default entity types
2025-09-13 15:37:11 +08:00
yangdx
0ffb5d5f2d
Replace search API with aquery_data for consistent raw data retrieval, mirroring aquery results
...
• Reuse existing query logic paths and remove kg_search function entirely
• Update kg_query/naive_query to return raw data as needed
2025-09-13 15:30:29 +08:00
yangdx
c2d064b580
Bump API version to 0221
2025-09-13 14:06:20 +08:00
yangdx
0496ddcb92
Merge branch 'optimize-extraction' into return-data-only
2025-09-13 13:33:07 +08:00
yangdx
f7aa108cc2
Update env.example
2025-09-13 11:27:02 +08:00
yangdx
4ce5f9014c
Improve error messages in entity and relationship extraction
2025-09-13 11:20:03 +08:00
yangdx
f3b5352019
Refine default entity types
2025-09-13 11:17:06 +08:00
yangdx
88b81658ea
Merge branch 'optimize-extraction' into return-data-only
2025-09-13 09:56:05 +08:00
yangdx
bf423a4ce1
Clarify output structure in prompt instructions by adding field count specifications
2025-09-13 09:51:33 +08:00
yangdx
369f799b16
Refine entity extraction prompts for clarity and consistency
...
• Clarify tuple delimiter usage
• Soften proper noun translation rules
• Standardize language requirements
• Improve output format consistency
2025-09-13 08:14:46 +08:00
yangdx
b6eb2f1c82
Merge branch 'optimize-extraction' into return-data-only
2025-09-12 18:13:59 +08:00
yangdx
9a2e8be5a7
Fix extraction validation and delimiter comment accuracy
...
• Change < to != for exact length check
• Fix entity validation from 4 to exact 4
• Fix relationship validation to exact 5
• Correct delimiter comment example
2025-09-12 18:13:25 +08:00
yangdx
2eddd1d46d
Merge branch 'optimize-extraction' into return-data-only
2025-09-12 18:03:58 +08:00
yangdx
8088b7e07a
Fix tuple delimiter corruption handling and update documentation
2025-09-12 18:03:37 +08:00
yangdx
f33e69204d
Merge branch 'optimize-extraction' into return-data-only
2025-09-12 17:46:04 +08:00
yangdx
8a3e2c03a9
Fix tuple delimiter corruption patterns with pipes and brackets
...
- Handle <||S||> malformed delimiters
- Fix <||> empty pipe sequences
- Repair <|| incomplete patterns
- Process ||S|| missing brackets
- Improve delimiter normalization
2025-09-12 17:45:32 +08:00
yangdx
477adbbf42
Merge branch 'optimize-extraction' into return-data-only
2025-09-12 17:02:08 +08:00
yangdx
43f6fcea6c
Fix linting
2025-09-12 17:00:53 +08:00
yangdx
1ee1fe895b
Merge branch 'qdrant1.7' into optimize-extraction
2025-09-12 16:40:53 +08:00
yangdx
69ca447f45
Sort description by timestamp then description length to improves merge consistency
2025-09-12 13:59:26 +08:00
yangdx
668a7c1f16
Bump API vesrion to 0220
2025-09-12 12:32:42 +08:00
yangdx
0221213b9b
Improve entity summarization with JSONL format and fix tuple delimiters
...
• Convert descriptions to JSONL format
• Add token-based truncation helper
• Enhance entity name consistency rules
• Improve summarization prompt clarity
• Fix tuple delimiter corruption patterns
2025-09-12 12:32:08 +08:00
yangdx
1892ed23cc
Change tuple delimiter from <|SEP|> to <|S|> across codebase
...
• Update prompt instruction clarity
• Correct utility function examples
• Update regex pattern comments
2025-09-12 08:57:46 +08:00
yangdx
b96f1484ec
Shorten tuple delimiter to <|S|> and refine relationship extraction text
...
• Remove redundant "within input text"
• Clarify relationship extraction scope
2025-09-12 08:36:43 +08:00
yangdx
c07bcbff44
Fix tuple delimiter corruption patterns and add missing edge cases
2025-09-12 08:35:37 +08:00
yangdx
8660bf34e4
Add timestamp tracking for LLM responses and entity/relationship data
...
- Track timestamps for cache hits/misses
- Add timestamp to entity/relationship objects
- Sort descriptions by timestamp order
- Preserve temporal ordering in merges
2025-09-12 04:34:12 +08:00
yangdx
40688def20
Refactor tuple delimiter corruption fix into reusable utility function
...
- Extract regex fixes to utils module
- Add case-insensitive delimiter handling
2025-09-12 04:10:14 +08:00
yangdx
b9f80263b8
Simplify tuple delimiter regex patterns for LLM output fixing
...
• Consolidate 6 regex patterns into 3
• More efficient pattern matching
• Clearer comments and examples
• Same functionality, less code
• Better maintainability
2025-09-12 00:56:40 +08:00
yangdx
78eadc1d6c
Rename function to clarify rebuild vs process extraction contexts
2025-09-11 23:21:27 +08:00
yangdx
b32bd993e1
Bump API version to 0219
2025-09-11 22:47:22 +08:00
yangdx
4ce823b4dd
Handle empty context in mix mode and improve query logging
2025-09-11 18:58:37 +08:00
yangdx
87f1b47218
Update env.examples
2025-09-11 15:50:16 +08:00