* feat: make tasks a configurable argument in the cognify function
* fix: add data points task
* Ugly hack for multi-metric eval bug
* some cleanup
---------
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.
* Load dataset file by filename, outsource utilities
* restructure metric selection
* Add comprehensiveness, diversity and empowerment metrics
* add promptfoo as an option
* refactor RAG solution in eval;2C
* LLM as a judge metrics implemented in a uniform way
* Use requests.get instead of wget
* clean up promptfoo config template
* minor fixes
* get promptfoo path instead of hardcoding
* minor fixes
* Add LLM as a judge prompts
* Support 4 different rag options in eval
* Minor refactor and logger usage
* feat: make tasks a configurable argument in the cognify function
* Run eval on a set of parameters and save results as json and png
* fix: add data points task
* script for running all param combinations
* enable context provider to get tasks as param
* bugfix in simple rag
* Incremental eval of cognee pipeline
* potential fix: single asyncio run
* temp fix: exclude insights
* Remove insights, have single asyncio run, refactor
* minor fixes
* handle pipeline slices in utils
* include all options in params json
---------
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.
* Load dataset file by filename, outsource utilities
* restructure metric selection
* Add comprehensiveness, diversity and empowerment metrics
* add promptfoo as an option
* refactor RAG solution in eval;2C
* LLM as a judge metrics implemented in a uniform way
* Use requests.get instead of wget
* clean up promptfoo config template
* minor fixes
* get promptfoo path instead of hardcoding
* minor fixes
* Add LLM as a judge prompts
* Support 4 different rag options in eval
* Minor refactor and logger usage
* Run eval on a set of parameters and save results as json and png
* script for running all param combinations
* bugfix in simple rag
* potential fix: single asyncio run
* temp fix: exclude insights
* Remove insights, have single asyncio run, refactor
---------
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>