* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets. * Load dataset file by filename, outsource utilities * restructure metric selection * Add comprehensiveness, diversity and empowerment metrics * add promptfoo as an option * refactor RAG solution in eval;2C * LLM as a judge metrics implemented in a uniform way * Use requests.get instead of wget * clean up promptfoo config template * minor fixes * get promptfoo path instead of hardcoding * minor fixes * Add LLM as a judge prompts * Minor refactor and logger usage
7 lines
228 B
YAML
7 lines
228 B
YAML
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
|
|
|
|
# Learn more about building a configuration: https://promptfoo.dev/docs/configuration/guide
|
|
|
|
description: "My eval"
|
|
providers:
|
|
- id: openai:gpt-4o-mini
|