* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets. * Load dataset file by filename, outsource utilities * restructure metric selection * Add comprehensiveness, diversity and empowerment metrics * add promptfoo as an option * refactor RAG solution in eval;2C * LLM as a judge metrics implemented in a uniform way * Use requests.get instead of wget * clean up promptfoo config template * minor fixes * get promptfoo path instead of hardcoding * minor fixes * Add LLM as a judge prompts * Minor refactor and logger usage
10 lines
264 B
JSON
10 lines
264 B
JSON
[
|
|
{
|
|
"role": "system",
|
|
"content": "Answer the question using the provided context. Be as brief as possible."
|
|
},
|
|
{
|
|
"role": "user",
|
|
"content": "The question is: `{{ question }}` \n And here is the context: `{{ context }}`"
|
|
}
|
|
]
|