cognee/cognee/infrastructure/llm/prompts/llm_judge_prompts.py
alekszievr 6653d73556
Feat/cog 950 improve metric selection (#435)
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.

* Load dataset file by filename, outsource utilities

* restructure metric selection

* Add comprehensiveness, diversity and empowerment metrics

* add promptfoo as an option

* refactor RAG solution in eval;2C

* LLM as a judge metrics implemented in a uniform way

* Use requests.get instead of wget

* clean up promptfoo config template

* minor fixes

* get promptfoo path instead of hardcoding

* minor fixes

* Add LLM as a judge prompts

* Minor refactor and logger usage
2025-01-15 10:45:55 +01:00

9 lines
690 B
Python

# LLM-as-a-judge metrics as described here: https://arxiv.org/abs/2404.16130
llm_judge_prompts = {
"correctness": "Determine whether the actual output is factually correct based on the expected output.",
"comprehensiveness": "Determine how much detail the answer provides to cover all the aspects and details of the question.",
"diversity": "Determine how varied and rich the answer is in providing different perspectives and insights on the question.",
"empowerment": "Determine how well the answer helps the reader understand and make informed judgements about the topic.",
"directness": "Determine how specifically and clearly the answer addresses the question.",
}