cognee

Author	SHA1	Message	Date
lxobr	bb8cb692e0	Cog 1293 corpus builder custom cognify tasks (#527 ) <!-- .github/pull_request_template.md --> ## Description - Enable custom tasks in corpus building ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced a configurable option to specify the task retrieval strategy during corpus building. - Enhanced the workflow with integrated task fetching, featuring a default retrieval mechanism. - Updated evaluation configuration to support customizable task selection for more flexible operations. - Added a new abstract base class for defining various task retrieval strategies. - Introduced a new enumeration to map task getter types to their corresponding classes. - Dependencies - Added a new dependency for downloading files from Google Drive. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-12 16:44:08 +01:00
vasilije	e6db870264	Add musique adapter base	2025-02-11 17:16:48 -05:00
hajdul88	6a0c0e3ef8	feat: Cognee evaluation framework development (#498 ) <!-- .github/pull_request_template.md --> This PR contains the evaluation framework development for cognee ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Expanded evaluation framework now integrates asynchronous corpus building, question answering, and performance evaluation with adaptive benchmarks for improved metrics (correctness, exact match, and F1 score). - Infrastructure - Added database integration for persistent storage of questions, answers, and metrics. - Launched an interactive metrics dashboard featuring advanced visualizations. - Introduced an automated testing workflow for continuous quality assurance. - Documentation - Updated guidelines for generating concise, clear answers. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-11 16:31:54 +01:00
Boris	f75e35c337	fix: custom model pipeline (#508 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features • Graph visualizations now allow exporting to a user-specified file path for more flexible output management. • The text embedding process has been enhanced with an additional tokenizer option for improved performance. • A new `ExtendableDataPoint` class has been introduced for future extensions. • New JSON files for companies and individuals have been added to facilitate testing and data processing. - Improvements • Search functionality now uses updated identifiers for more reliable content retrieval. • Metadata handling has been streamlined across various classes by removing unnecessary type specifications. • Enhanced serialization of properties in the Neo4j adapter for improved handling of complex structures. • The setup process for databases has been improved with a new asynchronous setup function. - Chores • Dependency and configuration updates improve overall stability and performance. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-08 02:00:15 +01:00
Igor Ilic	5fe7ff9883	refactor: Refactor search so graph completion is used by default (#505 ) <!-- .github/pull_request_template.md --> ## Description Refactor search so query type doesn't need to be provided to make it simpler for new users ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Refactor - Improved the search interface by standardizing parameter usage with explicit keyword arguments for specifying search types, enhancing clarity and consistency. - Tests - Updated test cases and example integrations to align with the revised search parameters, ensuring consistent behavior and reliable validation of search outcomes. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-02-07 17:16:34 +01:00
Vasilije	4d3acc358a	fix: mcp improvements (#472 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Dependency Update - Downgraded `mcp` package version from 1.2.0 to 1.1.3 - Updated `cognee` dependency to include additional features with `cognee[codegraph]` - New Features - Introduced a new tool, "codify", for transforming codebases into knowledge graphs - Enhanced the existing "search" tool to accept a new parameter for search type - Improvements - Streamlined search functionality with a new modular approach - Added new asynchronous function for retrieving and formatting code parts - Documentation - Updated import paths for `SearchType` in various modules and tests to reflect structural changes - Code Cleanup - Removed legacy search module and associated classes/functions - Refined data transfer object classes for consistency and clarity <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>	2025-02-04 08:47:31 +01:00
Igor Ilic	8879f3fbbe	feat: Add gemini support [COG-1023] (#485 ) <!-- .github/pull_request_template.md --> ## Description PR to test Gemini PR from holchan 1. Add Gemini LLM and Gemini Embedding support 2. Fix CodeGraph issue with chunks being bigger than maximum token value 3. Add Tokenizer adapters to CodeGraph ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added support for the Gemini LLM provider. - Expanded LLM configuration options. - Introduced a new GitHub Actions workflow for multimetric QA evaluation. - Added new environment variables for LLM and embedding configurations across various workflows. - Bug Fixes - Improved error handling in various components. - Updated tokenization and embedding processes. - Removed warning related to missing `dict` method in data items. - Refactor - Simplified token extraction and decoding methods. - Updated tokenizer interfaces. - Removed deprecated dependencies. - Enhanced retry logic and error handling in embedding processes. - Documentation - Updated configuration comments and settings. - Chores - Updated GitHub Actions workflows to accommodate new secrets and environment variables. - Modified evaluation parameters. - Adjusted dependency management for optional libraries. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: holchan <61059652+holchan@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com>	2025-01-31 18:03:23 +01:00
Vasilije	8a50da8ff5	Merge pull request #475 from topoteretes/feat/COG-1060-code-pipeline-endpoints feat: add codegraph related API endpoints	2025-01-28 14:46:52 +01:00
alekszievr	5e076689ad	Feat: [COG-1074] fix multimetric eval bug (#463 ) * feat: make tasks a configurable argument in the cognify function * fix: add data points task * Ugly hack for multi-metric eval bug * some cleanup --------- Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>	2025-01-28 13:05:22 +01:00
Boris Arzentar	3320bc8f2c	feat: add codegraph related API endpoints	2025-01-28 10:08:59 +01:00
alekszievr	4e3a666b33	Feat: Save and load contexts and answers for eval (#462 ) * feat: make tasks a configurable argument in the cognify function * fix: add data points task * eval on random samples instead of first couple * Save and load contexts and answers * Fix random seed usage and handle empty descriptions * include insights search in cognee option * create output dir if doesnt exist --------- Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>	2025-01-22 16:17:01 +01:00
alekszievr	75bc7f67eb	feat: Add incremental eval option to paramset (#446 ) * QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets. * Load dataset file by filename, outsource utilities * restructure metric selection * Add comprehensiveness, diversity and empowerment metrics * add promptfoo as an option * refactor RAG solution in eval;2C * LLM as a judge metrics implemented in a uniform way * Use requests.get instead of wget * clean up promptfoo config template * minor fixes * get promptfoo path instead of hardcoding * minor fixes * Add LLM as a judge prompts * Support 4 different rag options in eval * Minor refactor and logger usage * feat: make tasks a configurable argument in the cognify function * Run eval on a set of parameters and save results as json and png * fix: add data points task * script for running all param combinations * enable context provider to get tasks as param * bugfix in simple rag * Incremental eval of cognee pipeline * potential fix: single asyncio run * temp fix: exclude insights * Remove insights, have single asyncio run, refactor * Include incremental eval in accepted paramsets * minor fixes * handle pipeline slices in utils * Handle insights and customize search types * Handle retrieved edges more safely * bugfix * fix simple rag --------- Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com> Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>	2025-01-17 18:04:31 +01:00
alekszievr	2e010f8dd1	Incremental eval of cognee pipeline (#445 ) * QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets. * Load dataset file by filename, outsource utilities * restructure metric selection * Add comprehensiveness, diversity and empowerment metrics * add promptfoo as an option * refactor RAG solution in eval;2C * LLM as a judge metrics implemented in a uniform way * Use requests.get instead of wget * clean up promptfoo config template * minor fixes * get promptfoo path instead of hardcoding * minor fixes * Add LLM as a judge prompts * Support 4 different rag options in eval * Minor refactor and logger usage * feat: make tasks a configurable argument in the cognify function * Run eval on a set of parameters and save results as json and png * fix: add data points task * script for running all param combinations * enable context provider to get tasks as param * bugfix in simple rag * Incremental eval of cognee pipeline * potential fix: single asyncio run * temp fix: exclude insights * Remove insights, have single asyncio run, refactor * minor fixes * handle pipeline slices in utils * include all options in params json --------- Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com> Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>	2025-01-17 14:16:48 +01:00
alekszievr	8ec1e48ff6	Run eval on a set of parameters and save them as png and json (#443 ) * QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets. * Load dataset file by filename, outsource utilities * restructure metric selection * Add comprehensiveness, diversity and empowerment metrics * add promptfoo as an option * refactor RAG solution in eval;2C * LLM as a judge metrics implemented in a uniform way * Use requests.get instead of wget * clean up promptfoo config template * minor fixes * get promptfoo path instead of hardcoding * minor fixes * Add LLM as a judge prompts * Support 4 different rag options in eval * Minor refactor and logger usage * Run eval on a set of parameters and save results as json and png * script for running all param combinations * bugfix in simple rag * potential fix: single asyncio run * temp fix: exclude insights * Remove insights, have single asyncio run, refactor --------- Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>	2025-01-17 00:18:51 +01:00
alekszievr	3494521cae	Support 4 different rag options in eval (#439 ) * QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets. * Load dataset file by filename, outsource utilities * restructure metric selection * Add comprehensiveness, diversity and empowerment metrics * add promptfoo as an option * refactor RAG solution in eval;2C * LLM as a judge metrics implemented in a uniform way * Use requests.get instead of wget * clean up promptfoo config template * minor fixes * get promptfoo path instead of hardcoding * minor fixes * Add LLM as a judge prompts * Support 4 different rag options in eval * Minor refactor and logger usage	2025-01-15 15:34:13 +01:00
alekszievr	6653d73556	Feat/cog 950 improve metric selection (#435 ) * QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets. * Load dataset file by filename, outsource utilities * restructure metric selection * Add comprehensiveness, diversity and empowerment metrics * add promptfoo as an option * refactor RAG solution in eval;2C * LLM as a judge metrics implemented in a uniform way * Use requests.get instead of wget * clean up promptfoo config template * minor fixes * get promptfoo path instead of hardcoding * minor fixes * Add LLM as a judge prompts * Minor refactor and logger usage	2025-01-15 10:45:55 +01:00
alekszievr	a4ad1702ed	Feat/cog 946 abstract eval dataset (#418 ) * QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets. * Load dataset file by filename, outsource utilities * Use requests.get instead of wget	2025-01-14 11:33:55 +01:00
hajdul88	e2ad54d88e	Fix: deleting incorrect repo path	2025-01-10 15:54:45 +01:00
hajdul88	6177d04b44	feat: implements code retreiver	2025-01-10 13:03:34 +01:00
hajdul88	9604d95ba5	feat: adds basic retriever for swe bench	2025-01-09 19:54:58 +01:00
Rita Aleksziev	18bb282fbc	Adjust SWE-bench script to code graph pipeline call	2025-01-09 14:52:02 +01:00
vasilije	76a0aa7e8b	Fix linter issues	2025-01-05 19:48:35 +01:00
vasilije	6dafe73a6b	Fix linter issues	2025-01-05 19:24:55 +01:00
vasilije	649fcf2ba8	Fix linter issues	2025-01-05 19:21:09 +01:00
vasilije	60c8fd103b	ruff format	2025-01-05 19:09:08 +01:00
lxobr	da5e3ab24d	COG 870 Remove duplicate edges from the code graph (#293 ) * feat: turn summarize_code into generator * feat: extract run_code_graph_pipeline, update the pipeline * feat: minimal code graph example * refactor: update argument * refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline * refactor: indentation and whitespace nits * refactor: add deprecated use comments and warnings --------- Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com>	2024-12-17 12:02:25 +01:00
alekszievr	4f2745504c	Calculate official hotpot EM and F1 scores (#292 )	2024-12-10 19:16:12 +01:00
Boris	348610e73c	fix: refactor get_graph_from_model to return nodes and edges correctly (#257 ) * fix: handle rate limit error coming from llm model * fix: fixes lost edges and nodes in get_graph_from_model * fix: fixes database pruning issue in pgvector (#261) * fix: cognee_demo notebook pipeline is not saving summaries --------- Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>	2024-12-06 12:52:01 +01:00
Boris Arzentar	d49ab4c3b5	feat: update code-graph notebook	2024-12-03 23:48:12 +01:00
Boris Arzentar	b89a4b8054	Merge remote-tracking branch 'origin/main' into code-graph	2024-12-03 21:14:19 +01:00
Rita Aleksziev	a0d5102bd8	add some spaces for readability	2024-12-03 17:22:23 +01:00
Rita Aleksziev	0fbb50960b	prompt renaming	2024-12-03 15:59:03 +01:00
Rita Aleksziev	dc082de4c2	minor bugfix in folder creation	2024-12-02 14:54:40 +01:00
Rita Aleksziev	f966f099fc	Prompt renaming to more specific names. Minor code changes.	2024-12-02 12:18:00 +01:00
Boris Arzentar	11acabdb6a	fix: remove duplicate nodes and edges before saving; Fix FalkorDB vector index;	2024-12-02 10:10:18 +01:00
Rita Aleksziev	a4c56f118d	Connect code graph pipeline + retriever + benchmarking	2024-11-29 15:24:49 +01:00
Rita Aleksziev	4da1657140	merge changes from code-graph	2024-11-29 12:16:36 +01:00
Rita Aleksziev	8f241fa6c5	convert edge to string	2024-11-29 12:05:52 +01:00
Leon Luithlen	a5ae9185cd	Replicate PR 33	2024-11-29 11:40:51 +01:00
Leon Luithlen	d9fc740ec0	Fix merge conflicts	2024-11-29 11:33:05 +01:00
Leon Luithlen	b46af5a6f6	Update eval_swe_bench	2024-11-29 11:31:03 +01:00
Leon Luithlen	618d476c30	Add code formating to usermod command	2024-11-29 11:30:39 +01:00
Leon Luithlen	5036f3a85f	Add -y to setup_ubuntu_instance.sh commands and update EC2_README	2024-11-29 11:30:39 +01:00
Leon Luithlen	1bfa3a0ea3	Rebase onto code-graph	2024-11-29 11:30:30 +01:00
Rita Aleksziev	996b3a658b	add custom metric implementation	2024-11-28 16:53:33 +01:00
Rita Aleksziev	8edfe7c5a4	feat/connect code graph pipeline to benchmarking	2024-11-28 16:52:54 +01:00
Boris Arzentar	2408fd7a01	fix: falkordb adapter errors	2024-11-28 09:12:37 +01:00
Rita Aleksziev	4aa634d5e1	Eval function takes eval_metric as input. Works with deepeval metrics like AnswerRelevancyMetric	2024-11-27 16:14:05 +01:00
Rita Aleksziev	f47b185a9e	feat/add correctness score calculation with LLM as a judge	2024-11-27 10:53:48 +01:00
Boris	64b8aac86f	feat: code graph swe integration Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com> Co-authored-by: hande-k <handekafkas7@gmail.com> Co-authored-by: Igor Ilic <igorilic03@gmail.com> Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>	2024-11-27 09:32:29 +01:00

1 2

70 commits