Commit graph

4 commits

Author SHA1 Message Date
alekszievr
6653d73556
Feat/cog 950 improve metric selection (#435)
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.

* Load dataset file by filename, outsource utilities

* restructure metric selection

* Add comprehensiveness, diversity and empowerment metrics

* add promptfoo as an option

* refactor RAG solution in eval;2C

* LLM as a judge metrics implemented in a uniform way

* Use requests.get instead of wget

* clean up promptfoo config template

* minor fixes

* get promptfoo path instead of hardcoding

* minor fixes

* Add LLM as a judge prompts

* Minor refactor and logger usage
2025-01-15 10:45:55 +01:00
vasilije
60c8fd103b ruff format 2025-01-05 19:09:08 +01:00
alekszievr
4f2745504c
Calculate official hotpot EM and F1 scores (#292) 2024-12-10 19:16:12 +01:00
Rita Aleksziev
996b3a658b add custom metric implementation 2024-11-28 16:53:33 +01:00