<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> • Created DirectLLMEvalAdapter - a lightweight alternative to DeepEval for answer evaluation • Added evaluation prompt files defining scoring criteria and format • Made adapter selectable via evaluation_engine = "DirectLLM" in config, supports "correctness" metric only ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced a new evaluation method that compares model responses against a reference answer using structured prompt templates. This approach enables automated scoring (ranging from 0 to 1) along with brief justifications. - **Enhancements** - Updated the configuration to clearly distinguish between evaluation options, providing end-users with a more transparent and reliable assessment process. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
18 lines
586 B
Python
18 lines
586 B
Python
from enum import Enum
|
|
from typing import Type
|
|
from evals.eval_framework.evaluation.deep_eval_adapter import DeepEvalAdapter
|
|
from evals.eval_framework.evaluation.direct_llm_eval_adapter import DirectLLMEvalAdapter
|
|
|
|
|
|
class EvaluatorAdapter(Enum):
|
|
DEEPEVAL = ("DeepEval", DeepEvalAdapter)
|
|
DIRECT_LLM = ("DirectLLM", DirectLLMEvalAdapter)
|
|
|
|
def __new__(cls, adapter_name: str, adapter_class: Type):
|
|
obj = object.__new__(cls)
|
|
obj._value_ = adapter_name
|
|
obj.adapter_class = adapter_class
|
|
return obj
|
|
|
|
def __str__(self):
|
|
return self.value
|