Commit graph

61 commits

Author SHA1 Message Date
Boris Arzentar
3320bc8f2c feat: add codegraph related API endpoints 2025-01-28 10:08:59 +01:00
alekszievr
4e3a666b33
Feat: Save and load contexts and answers for eval (#462)
* feat: make tasks a configurable argument in the cognify function

* fix: add data points task

* eval on random samples instead of first couple

* Save and load contexts and answers

* Fix random seed usage and handle empty descriptions

* include insights search in cognee option

* create output dir if doesnt exist

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
2025-01-22 16:17:01 +01:00
alekszievr
75bc7f67eb
feat: Add incremental eval option to paramset (#446)
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.

* Load dataset file by filename, outsource utilities

* restructure metric selection

* Add comprehensiveness, diversity and empowerment metrics

* add promptfoo as an option

* refactor RAG solution in eval;2C

* LLM as a judge metrics implemented in a uniform way

* Use requests.get instead of wget

* clean up promptfoo config template

* minor fixes

* get promptfoo path instead of hardcoding

* minor fixes

* Add LLM as a judge prompts

* Support 4 different rag options in eval

* Minor refactor and logger usage

* feat: make tasks a configurable argument in the cognify function

* Run eval on a set of parameters and save results as json and png

* fix: add data points task

* script for running all param combinations

* enable context provider to get tasks as param

* bugfix in simple rag

* Incremental eval of cognee pipeline

* potential fix: single asyncio run

* temp fix: exclude insights

* Remove insights, have single asyncio run, refactor

* Include incremental eval in accepted paramsets

* minor fixes

* handle pipeline slices in utils

* Handle insights and customize search types

* Handle retrieved edges more safely

* bugfix

* fix simple rag

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-01-17 18:04:31 +01:00
alekszievr
2e010f8dd1
Incremental eval of cognee pipeline (#445)
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.

* Load dataset file by filename, outsource utilities

* restructure metric selection

* Add comprehensiveness, diversity and empowerment metrics

* add promptfoo as an option

* refactor RAG solution in eval;2C

* LLM as a judge metrics implemented in a uniform way

* Use requests.get instead of wget

* clean up promptfoo config template

* minor fixes

* get promptfoo path instead of hardcoding

* minor fixes

* Add LLM as a judge prompts

* Support 4 different rag options in eval

* Minor refactor and logger usage

* feat: make tasks a configurable argument in the cognify function

* Run eval on a set of parameters and save results as json and png

* fix: add data points task

* script for running all param combinations

* enable context provider to get tasks as param

* bugfix in simple rag

* Incremental eval of cognee pipeline

* potential fix: single asyncio run

* temp fix: exclude insights

* Remove insights, have single asyncio run, refactor

* minor fixes

* handle pipeline slices in utils

* include all options in params json

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-01-17 14:16:48 +01:00
alekszievr
8ec1e48ff6
Run eval on a set of parameters and save them as png and json (#443)
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.

* Load dataset file by filename, outsource utilities

* restructure metric selection

* Add comprehensiveness, diversity and empowerment metrics

* add promptfoo as an option

* refactor RAG solution in eval;2C

* LLM as a judge metrics implemented in a uniform way

* Use requests.get instead of wget

* clean up promptfoo config template

* minor fixes

* get promptfoo path instead of hardcoding

* minor fixes

* Add LLM as a judge prompts

* Support 4 different rag options in eval

* Minor refactor and logger usage

* Run eval on a set of parameters and save results as json and png

* script for running all param combinations

* bugfix in simple rag

* potential fix: single asyncio run

* temp fix: exclude insights

* Remove insights, have single asyncio run, refactor

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
2025-01-17 00:18:51 +01:00
alekszievr
3494521cae
Support 4 different rag options in eval (#439)
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.

* Load dataset file by filename, outsource utilities

* restructure metric selection

* Add comprehensiveness, diversity and empowerment metrics

* add promptfoo as an option

* refactor RAG solution in eval;2C

* LLM as a judge metrics implemented in a uniform way

* Use requests.get instead of wget

* clean up promptfoo config template

* minor fixes

* get promptfoo path instead of hardcoding

* minor fixes

* Add LLM as a judge prompts

* Support 4 different rag options in eval

* Minor refactor and logger usage
2025-01-15 15:34:13 +01:00
alekszievr
6653d73556
Feat/cog 950 improve metric selection (#435)
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.

* Load dataset file by filename, outsource utilities

* restructure metric selection

* Add comprehensiveness, diversity and empowerment metrics

* add promptfoo as an option

* refactor RAG solution in eval;2C

* LLM as a judge metrics implemented in a uniform way

* Use requests.get instead of wget

* clean up promptfoo config template

* minor fixes

* get promptfoo path instead of hardcoding

* minor fixes

* Add LLM as a judge prompts

* Minor refactor and logger usage
2025-01-15 10:45:55 +01:00
alekszievr
a4ad1702ed
Feat/cog 946 abstract eval dataset (#418)
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.

* Load dataset file by filename, outsource utilities

* Use requests.get instead of wget
2025-01-14 11:33:55 +01:00
hajdul88
e2ad54d88e Fix: deleting incorrect repo path 2025-01-10 15:54:45 +01:00
hajdul88
6177d04b44 feat: implements code retreiver 2025-01-10 13:03:34 +01:00
hajdul88
9604d95ba5 feat: adds basic retriever for swe bench 2025-01-09 19:54:58 +01:00
Rita Aleksziev
18bb282fbc Adjust SWE-bench script to code graph pipeline call 2025-01-09 14:52:02 +01:00
vasilije
76a0aa7e8b Fix linter issues 2025-01-05 19:48:35 +01:00
vasilije
6dafe73a6b Fix linter issues 2025-01-05 19:24:55 +01:00
vasilije
649fcf2ba8 Fix linter issues 2025-01-05 19:21:09 +01:00
vasilije
60c8fd103b ruff format 2025-01-05 19:09:08 +01:00
lxobr
da5e3ab24d
COG 870 Remove duplicate edges from the code graph (#293)
* feat: turn summarize_code into generator

* feat: extract run_code_graph_pipeline, update the pipeline

* feat: minimal code graph example

* refactor: update argument

* refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline

* refactor: indentation and whitespace nits

* refactor: add deprecated use comments and warnings

---------

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2024-12-17 12:02:25 +01:00
alekszievr
4f2745504c
Calculate official hotpot EM and F1 scores (#292) 2024-12-10 19:16:12 +01:00
Boris
348610e73c
fix: refactor get_graph_from_model to return nodes and edges correctly (#257)
* fix: handle rate limit error coming from llm model

* fix: fixes lost edges and nodes in get_graph_from_model

* fix: fixes database pruning issue in pgvector (#261)

* fix: cognee_demo notebook pipeline is not saving summaries

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2024-12-06 12:52:01 +01:00
Boris Arzentar
d49ab4c3b5 feat: update code-graph notebook 2024-12-03 23:48:12 +01:00
Boris Arzentar
b89a4b8054 Merge remote-tracking branch 'origin/main' into code-graph 2024-12-03 21:14:19 +01:00
Rita Aleksziev
a0d5102bd8 add some spaces for readability 2024-12-03 17:22:23 +01:00
Rita Aleksziev
0fbb50960b prompt renaming 2024-12-03 15:59:03 +01:00
Rita Aleksziev
dc082de4c2 minor bugfix in folder creation 2024-12-02 14:54:40 +01:00
Rita Aleksziev
f966f099fc Prompt renaming to more specific names. Minor code changes. 2024-12-02 12:18:00 +01:00
Boris Arzentar
11acabdb6a fix: remove duplicate nodes and edges before saving; Fix FalkorDB vector index; 2024-12-02 10:10:18 +01:00
Rita Aleksziev
a4c56f118d Connect code graph pipeline + retriever + benchmarking 2024-11-29 15:24:49 +01:00
Rita Aleksziev
4da1657140 merge changes from code-graph 2024-11-29 12:16:36 +01:00
Rita Aleksziev
8f241fa6c5 convert edge to string 2024-11-29 12:05:52 +01:00
Leon Luithlen
a5ae9185cd Replicate PR 33 2024-11-29 11:40:51 +01:00
Leon Luithlen
d9fc740ec0 Fix merge conflicts 2024-11-29 11:33:05 +01:00
Leon Luithlen
b46af5a6f6 Update eval_swe_bench 2024-11-29 11:31:03 +01:00
Leon Luithlen
618d476c30 Add code formating to usermod command 2024-11-29 11:30:39 +01:00
Leon Luithlen
5036f3a85f Add -y to setup_ubuntu_instance.sh commands and update EC2_README 2024-11-29 11:30:39 +01:00
Leon Luithlen
1bfa3a0ea3 Rebase onto code-graph 2024-11-29 11:30:30 +01:00
Rita Aleksziev
996b3a658b add custom metric implementation 2024-11-28 16:53:33 +01:00
Rita Aleksziev
8edfe7c5a4 feat/connect code graph pipeline to benchmarking 2024-11-28 16:52:54 +01:00
Boris Arzentar
2408fd7a01 fix: falkordb adapter errors 2024-11-28 09:12:37 +01:00
Rita Aleksziev
4aa634d5e1 Eval function takes eval_metric as input. Works with deepeval metrics like AnswerRelevancyMetric 2024-11-27 16:14:05 +01:00
Rita Aleksziev
f47b185a9e feat/add correctness score calculation with LLM as a judge 2024-11-27 10:53:48 +01:00
Boris
64b8aac86f
feat: code graph swe integration
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
Co-authored-by: hande-k <handekafkas7@gmail.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
2024-11-27 09:32:29 +01:00
Rita Aleksziev
e1d8f3ea86 use acreate_structured_output instead of create_structured_output in eval script 2024-11-20 16:02:15 +01:00
Rita Aleksziev
2948089806 Read patch generation instructions from file 2024-11-19 14:07:53 +01:00
Rita Aleksziev
838d98238a Code cleanup 2024-11-19 13:54:04 +01:00
Rita Aleksziev
d986e7c981 minor code cleanup 2024-11-18 15:59:18 +01:00
Rita Aleksziev
98e3445c2c running swebench evaluation as subprocess 2024-11-18 15:12:36 +01:00
Rita Aleksziev
ed08cdb9f9 using the code graph pipeline instead of cognify 2024-11-15 17:56:19 +01:00
Rita Aleksziev
721fde3d60 generating testspecs for data 2024-11-15 17:14:43 +01:00
Rita Aleksziev
094ba7233e Running inference with and without cognee 2024-11-14 16:28:03 +01:00
Rita Aleksziev
aa95aa21af downloading example repo for eval 2024-11-12 17:40:42 +01:00