<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> • Created DirectLLMEvalAdapter - a lightweight alternative to DeepEval for answer evaluation • Added evaluation prompt files defining scoring criteria and format • Made adapter selectable via evaluation_engine = "DirectLLM" in config, supports "correctness" metric only ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced a new evaluation method that compares model responses against a reference answer using structured prompt templates. This approach enables automated scoring (ranging from 0 to 1) along with brief justifications. - **Enhancements** - Updated the configuration to clearly distinguish between evaluation options, providing end-users with a more transparent and reliable assessment process. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
10 lines
490 B
Text
10 lines
490 B
Text
You are helping a reasonable person evaluate and score answers
|
|
• Compare the provided answer to the golden answer based on common-sense meaning and understanding.
|
|
• Focus on the meaning, not the exact wording or structure.
|
|
• If the answer is correct, don't penalize it for being too short or too long.
|
|
• Extra details are fine as long as the correct answer is included.
|
|
• Score should be between 0 and 1.
|
|
|
|
Provide:
|
|
1. A numerical score
|
|
2. A brief explanation justifying the score
|