Commit graph

81 commits

Author SHA1 Message Date
alekszievr
219b68c6b0
chore: Remove old eval files [cog-1567] (#649)
<!-- .github/pull_request_template.md -->

## Description
Removed old, unused eval files. 
- swe-bench eval files are kept here as swe-bench eval is not handled by
the new eval framework
- EC2_readme and cloud/setup_ubuntu_instance.sh will be removed (and
moved to the docs website) as part of another task

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin
2025-03-17 19:19:39 +01:00
hajdul88
e3f3d49a3b
Feature/cog 1312 integrating evaluation framework into dreamify (#562)
<!-- .github/pull_request_template.md -->

## Description
This PR contains eval framework changes due to the autooptimizer
integration

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
  - Enhanced answer generation now returns structured answer details.
  - Search functionality accepts configurable prompt inputs.
  - Option to generate a metrics dashboard from evaluations.
- Corpus building tasks now support adjustable chunk settings for
greater flexibility.
- New task retrieval functionality allows for flexible task
configuration.
  - Introduced new methods for creating and managing metrics dashboards.

- **Refactor/Chore**
- Streamlined API signatures and reorganized module interfaces for
better consistency.
  - Updated import paths to reflect new module structure.

- **Tests**
- Updated test scenarios to align with new configurations and parameter
adjustments.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-03 19:55:47 +01:00
lxobr
bee04cad86
Feat/cog 1331 modal run eval (#576)
<!-- .github/pull_request_template.md -->

## Description
- Split metrics dashboard into two modules: calculator (statistics) and
generator (visualization)
- Added aggregate metrics as a new phase in evaluation pipeline
- Created modal example to run multiple evaluations in parallel and
collect results into a single combined output
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced metrics reporting with improved visualizations, including
histogram and confidence interval plots.
- Introduced an asynchronous evaluation process that supports parallel
execution and streamlined result aggregation.
- Added new configuration options to control metrics calculation and
aggregated output storage.

- **Refactor**
- Restructured dashboard generation and evaluation workflows into a more
modular, maintainable design.
- Improved error handling and logging for better feedback during
evaluation processes.

- **Bug Fixes**
- Updated test cases to ensure accurate validation of the new dashboard
generation and metrics calculation functionalities.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-03 14:22:32 +01:00
lxobr
ca2cbfab91
feat: add direct llm eval adapter (#591)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
• Created DirectLLMEvalAdapter - a lightweight alternative to DeepEval
for answer evaluation
• Added evaluation prompt files defining scoring criteria and format
• Made adapter selectable via evaluation_engine = "DirectLLM" in config,
supports "correctness" metric only
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced a new evaluation method that compares model responses
against a reference answer using structured prompt templates. This
approach enables automated scoring (ranging from 0 to 1) along with
brief justifications.
  
- **Enhancements**
- Updated the configuration to clearly distinguish between evaluation
options, providing end-users with a more transparent and reliable
assessment process.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-03-01 19:50:20 +01:00
Daniel Molnar
d27f847753
Transition to new retrievers, update searches (#585)
<!-- .github/pull_request_template.md -->

## Description
Delete legacy search implementations after migrating to new retriever
classes

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced search and retrieval capabilities, providing improved context
resolution for code queries, completions, summaries, and graph
connections.
  
- **Refactor**
- Shifted to a modular, object-oriented approach that consolidates query
logic and streamlines error management for a more robust and scalable
experience.
  
- **Bug Fixes**
- Improved error handling for unsupported search types and retrieval
operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-27 15:25:24 +01:00
lxobr
4b7c21d7d8
feat: retrieve golden contexts [COG-1364] (#579)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
• Added load_golden_context parameter to BaseBenchmarkAdapter's abstract
load_corpus method, establishing a common interface for retrieving
supporting evidence
• Refactored HotpotQAAdapter with a modular design: introduced
_get_metadata_field_name method to handle dataset-specific fields
(making it extensible for child classes), implemented get golden context
functionality.
• Refactored TwoWikiMultihopAdapter to inherit from HotpotQAAdapter,
overriding only the necessary methods while reusing parent's
functionality
• Added golden context support to MusiqueQAAdapter with their
decomposition-based format
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced an option to include additional context during corpus
loading, enhancing the quality and flexibility of generated QA pairs.
- **Refactor**
- Streamlined and modularized the processing workflow across different
adapters for improved consistency and maintainability.
- Updated metadata extraction to refine the display of contextual
information.
- Shifted focus in the `TwoWikiMultihopAdapter` from corpus loading to
context extraction.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-27 13:25:47 +01:00
lxobr
9cc357ac1c
Feat/cog 1365 unify retrievers (#572)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
- Created the `BaseRetriever` class to unify all the retrievers and
searches.
- Implemented seven specialized retrievers (summaries, chunks,
completions, graph, graph-summary, insights, code) with consistent
get_context/get_completion interfaces.
- Added json context dumping feature in the current completion
implementations to enable context comparisons.
- Built a comparison framework to validate old vs new implementations.
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced multiple retrieval classes for enhanced search
capabilities, including `BaseRetriever`, `ChunksRetriever`,
`CodeRetriever`, `CompletionRetriever`, `GraphCompletionRetriever`,
`GraphSummaryCompletionRetriever`, `InsightsRetriever`, and
`SummariesRetriever`.
- Enhanced query completions with optional context saving for improved
data persistence.
- Implemented advanced tools to compare retrieval outcomes across
different implementations.

- **Refactor**
- Streamlined internal module organization and updated references for
increased maintainability and consistency.
- Added comments indicating future maintenance tasks related to code
merging.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-02-27 12:13:21 +01:00
lxobr
1cb83312fe
feat: add experimental cognify pipeline [COG-1293] (#541)
<!-- .github/pull_request_template.md -->

## Description
- Integrate experimental tasks into the evaluation framework
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced interactive prompt templates for extracting graph nodes,
edge triplets, and relationship names, resulting in more comprehensive
and accurate knowledge graphs.
- Added asynchronous processes to efficiently handle document data and
integrate graph components.
- Launched cascade graph task options to offer enhanced flexibility in
task management workflows.
- Added new functionality for extracting content nodes and relationship
names from text.

- **Refactor**
- Streamlined configurations for prompt processing and task
initialization, improving overall modularity and system stability.
- Updated task getter mechanisms to utilize function-based approaches
for improved flexibility.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-02-25 16:14:27 +01:00
alekszievr
17231de5d0
Test: Parse context pieces separately in MusiqueQAAdapter and adjust tests [cog-1234] (#561)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Tests**
- Updated evaluation checks by removing assertions related to the
relationship between `corpus_list` and `qa_pairs`, now focusing solely
on `qa_pairs` limits.

- **Refactor**
- Improved content processing to append each paragraph individually to
`corpus_list`, enhancing clarity in data structure.
- Simplified type annotations in the `load_corpus` method across
multiple adapters, ensuring consistency in return types.

- **Chores**
- Updated dependency installation commands in GitHub Actions workflows
for Python 3.10, 3.11, and 3.12 to include additional evaluation-related
dependencies.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
2025-02-20 14:23:53 +01:00
Boris
ada466879e
fix: add default params to run_tasks (#563)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced the task execution process by enabling default values for
certain parameters, allowing users to trigger task processing without
supplying every input explicitly.
  
- **Bug Fixes**
- Adjusted asynchronous handling for the `retrieved_edges_to_string`
function to ensure proper execution flow in various components.

- **Documentation**
- Updated markdown formatting in the Jupyter notebook for improved
readability and structure.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-02-19 20:18:51 +01:00
Vasilije
e98d51aac9
Add musique adapter base (#525)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Improved data handling by updating the dataset file path and ensuring
answers are consistently converted to lowercase for reliable processing.
  
- **Tests**
- Introduced unit tests to validate that data adapters instantiate
correctly, return non-empty content, and respect specified limits.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2025-02-18 19:48:22 +01:00
lxobr
bb8cb692e0
Cog 1293 corpus builder custom cognify tasks (#527)
<!-- .github/pull_request_template.md -->

## Description
- Enable custom tasks in corpus building
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced a configurable option to specify the task retrieval
strategy during corpus building.
- Enhanced the workflow with integrated task fetching, featuring a
default retrieval mechanism.
- Updated evaluation configuration to support customizable task
selection for more flexible operations.
- Added a new abstract base class for defining various task retrieval
strategies.
- Introduced a new enumeration to map task getter types to their
corresponding classes.
  
- **Dependencies**
  - Added a new dependency for downloading files from Google Drive.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-12 16:44:08 +01:00
vasilije
e6db870264 Add musique adapter base 2025-02-11 17:16:48 -05:00
hajdul88
6a0c0e3ef8
feat: Cognee evaluation framework development (#498)
<!-- .github/pull_request_template.md -->

This PR contains the evaluation framework development for cognee

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Expanded evaluation framework now integrates asynchronous corpus
building, question answering, and performance evaluation with adaptive
benchmarks for improved metrics (correctness, exact match, and F1
score).

- **Infrastructure**
- Added database integration for persistent storage of questions,
answers, and metrics.
- Launched an interactive metrics dashboard featuring advanced
visualizations.
- Introduced an automated testing workflow for continuous quality
assurance.

- **Documentation**
  - Updated guidelines for generating concise, clear answers.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-11 16:31:54 +01:00
Boris
f75e35c337
fix: custom model pipeline (#508)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit


- **New Features**
• Graph visualizations now allow exporting to a user-specified file path
for more flexible output management.
• The text embedding process has been enhanced with an additional
tokenizer option for improved performance.
• A new `ExtendableDataPoint` class has been introduced for future
extensions.
• New JSON files for companies and individuals have been added to
facilitate testing and data processing.

- **Improvements**
• Search functionality now uses updated identifiers for more reliable
content retrieval.
• Metadata handling has been streamlined across various classes by
removing unnecessary type specifications.
• Enhanced serialization of properties in the Neo4j adapter for improved
handling of complex structures.
• The setup process for databases has been improved with a new
asynchronous setup function.

- **Chores**
• Dependency and configuration updates improve overall stability and
performance.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-08 02:00:15 +01:00
Igor Ilic
5fe7ff9883
refactor: Refactor search so graph completion is used by default (#505)
<!-- .github/pull_request_template.md -->

## Description
Refactor search so query type doesn't need to be provided to make it
simpler for new users

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Improved the search interface by standardizing parameter usage with
explicit keyword arguments for specifying search types, enhancing
clarity and consistency.
- **Tests**
- Updated test cases and example integrations to align with the revised
search parameters, ensuring consistent behavior and reliable validation
of search outcomes.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-07 17:16:34 +01:00
Vasilije
4d3acc358a
fix: mcp improvements (#472)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Dependency Update**
	- Downgraded `mcp` package version from 1.2.0 to 1.1.3
- Updated `cognee` dependency to include additional features with
`cognee[codegraph]`

- **New Features**
- Introduced a new tool, "codify", for transforming codebases into
knowledge graphs
- Enhanced the existing "search" tool to accept a new parameter for
search type

- **Improvements**
	- Streamlined search functionality with a new modular approach
- Added new asynchronous function for retrieving and formatting code
parts

- **Documentation**
- Updated import paths for `SearchType` in various modules and tests to
reflect structural changes

- **Code Cleanup**
	- Removed legacy search module and associated classes/functions
	- Refined data transfer object classes for consistency and clarity
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Boris Arzentar <borisarzentar@gmail.com>
2025-02-04 08:47:31 +01:00
Igor Ilic
8879f3fbbe
feat: Add gemini support [COG-1023] (#485)
<!-- .github/pull_request_template.md -->

## Description
PR to test Gemini PR from holchan

1. Add Gemini LLM and Gemini Embedding support 
2. Fix CodeGraph issue with chunks being bigger than maximum token value
3. Add Tokenizer adapters to CodeGraph

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
    - Added support for the Gemini LLM provider.
    - Expanded LLM configuration options.
- Introduced a new GitHub Actions workflow for multimetric QA
evaluation.
- Added new environment variables for LLM and embedding configurations
across various workflows.

- **Bug Fixes**
    - Improved error handling in various components.
    - Updated tokenization and embedding processes.
    - Removed warning related to missing `dict` method in data items.

- **Refactor**
    - Simplified token extraction and decoding methods.
    - Updated tokenizer interfaces.
    - Removed deprecated dependencies.
    - Enhanced retry logic and error handling in embedding processes.

- **Documentation**
    - Updated configuration comments and settings.

- **Chores**
- Updated GitHub Actions workflows to accommodate new secrets and
environment variables.
    - Modified evaluation parameters.
    - Adjusted dependency management for optional libraries.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: holchan <61059652+holchan@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2025-01-31 18:03:23 +01:00
Vasilije
8a50da8ff5
Merge pull request #475 from topoteretes/feat/COG-1060-code-pipeline-endpoints
feat: add codegraph related API endpoints
2025-01-28 14:46:52 +01:00
alekszievr
5e076689ad
Feat: [COG-1074] fix multimetric eval bug (#463)
* feat: make tasks a configurable argument in the cognify function

* fix: add data points task

* Ugly hack for multi-metric eval bug

* some cleanup

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
2025-01-28 13:05:22 +01:00
Boris Arzentar
3320bc8f2c feat: add codegraph related API endpoints 2025-01-28 10:08:59 +01:00
alekszievr
4e3a666b33
Feat: Save and load contexts and answers for eval (#462)
* feat: make tasks a configurable argument in the cognify function

* fix: add data points task

* eval on random samples instead of first couple

* Save and load contexts and answers

* Fix random seed usage and handle empty descriptions

* include insights search in cognee option

* create output dir if doesnt exist

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
2025-01-22 16:17:01 +01:00
alekszievr
75bc7f67eb
feat: Add incremental eval option to paramset (#446)
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.

* Load dataset file by filename, outsource utilities

* restructure metric selection

* Add comprehensiveness, diversity and empowerment metrics

* add promptfoo as an option

* refactor RAG solution in eval;2C

* LLM as a judge metrics implemented in a uniform way

* Use requests.get instead of wget

* clean up promptfoo config template

* minor fixes

* get promptfoo path instead of hardcoding

* minor fixes

* Add LLM as a judge prompts

* Support 4 different rag options in eval

* Minor refactor and logger usage

* feat: make tasks a configurable argument in the cognify function

* Run eval on a set of parameters and save results as json and png

* fix: add data points task

* script for running all param combinations

* enable context provider to get tasks as param

* bugfix in simple rag

* Incremental eval of cognee pipeline

* potential fix: single asyncio run

* temp fix: exclude insights

* Remove insights, have single asyncio run, refactor

* Include incremental eval in accepted paramsets

* minor fixes

* handle pipeline slices in utils

* Handle insights and customize search types

* Handle retrieved edges more safely

* bugfix

* fix simple rag

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-01-17 18:04:31 +01:00
alekszievr
2e010f8dd1
Incremental eval of cognee pipeline (#445)
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.

* Load dataset file by filename, outsource utilities

* restructure metric selection

* Add comprehensiveness, diversity and empowerment metrics

* add promptfoo as an option

* refactor RAG solution in eval;2C

* LLM as a judge metrics implemented in a uniform way

* Use requests.get instead of wget

* clean up promptfoo config template

* minor fixes

* get promptfoo path instead of hardcoding

* minor fixes

* Add LLM as a judge prompts

* Support 4 different rag options in eval

* Minor refactor and logger usage

* feat: make tasks a configurable argument in the cognify function

* Run eval on a set of parameters and save results as json and png

* fix: add data points task

* script for running all param combinations

* enable context provider to get tasks as param

* bugfix in simple rag

* Incremental eval of cognee pipeline

* potential fix: single asyncio run

* temp fix: exclude insights

* Remove insights, have single asyncio run, refactor

* minor fixes

* handle pipeline slices in utils

* include all options in params json

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-01-17 14:16:48 +01:00
alekszievr
8ec1e48ff6
Run eval on a set of parameters and save them as png and json (#443)
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.

* Load dataset file by filename, outsource utilities

* restructure metric selection

* Add comprehensiveness, diversity and empowerment metrics

* add promptfoo as an option

* refactor RAG solution in eval;2C

* LLM as a judge metrics implemented in a uniform way

* Use requests.get instead of wget

* clean up promptfoo config template

* minor fixes

* get promptfoo path instead of hardcoding

* minor fixes

* Add LLM as a judge prompts

* Support 4 different rag options in eval

* Minor refactor and logger usage

* Run eval on a set of parameters and save results as json and png

* script for running all param combinations

* bugfix in simple rag

* potential fix: single asyncio run

* temp fix: exclude insights

* Remove insights, have single asyncio run, refactor

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
2025-01-17 00:18:51 +01:00
alekszievr
3494521cae
Support 4 different rag options in eval (#439)
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.

* Load dataset file by filename, outsource utilities

* restructure metric selection

* Add comprehensiveness, diversity and empowerment metrics

* add promptfoo as an option

* refactor RAG solution in eval;2C

* LLM as a judge metrics implemented in a uniform way

* Use requests.get instead of wget

* clean up promptfoo config template

* minor fixes

* get promptfoo path instead of hardcoding

* minor fixes

* Add LLM as a judge prompts

* Support 4 different rag options in eval

* Minor refactor and logger usage
2025-01-15 15:34:13 +01:00
alekszievr
6653d73556
Feat/cog 950 improve metric selection (#435)
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.

* Load dataset file by filename, outsource utilities

* restructure metric selection

* Add comprehensiveness, diversity and empowerment metrics

* add promptfoo as an option

* refactor RAG solution in eval;2C

* LLM as a judge metrics implemented in a uniform way

* Use requests.get instead of wget

* clean up promptfoo config template

* minor fixes

* get promptfoo path instead of hardcoding

* minor fixes

* Add LLM as a judge prompts

* Minor refactor and logger usage
2025-01-15 10:45:55 +01:00
alekszievr
a4ad1702ed
Feat/cog 946 abstract eval dataset (#418)
* QA eval dataset as argument, with hotpot and 2wikimultihop as options. Json schema validation for datasets.

* Load dataset file by filename, outsource utilities

* Use requests.get instead of wget
2025-01-14 11:33:55 +01:00
hajdul88
e2ad54d88e Fix: deleting incorrect repo path 2025-01-10 15:54:45 +01:00
hajdul88
6177d04b44 feat: implements code retreiver 2025-01-10 13:03:34 +01:00
hajdul88
9604d95ba5 feat: adds basic retriever for swe bench 2025-01-09 19:54:58 +01:00
Rita Aleksziev
18bb282fbc Adjust SWE-bench script to code graph pipeline call 2025-01-09 14:52:02 +01:00
vasilije
76a0aa7e8b Fix linter issues 2025-01-05 19:48:35 +01:00
vasilije
6dafe73a6b Fix linter issues 2025-01-05 19:24:55 +01:00
vasilije
649fcf2ba8 Fix linter issues 2025-01-05 19:21:09 +01:00
vasilije
60c8fd103b ruff format 2025-01-05 19:09:08 +01:00
lxobr
da5e3ab24d
COG 870 Remove duplicate edges from the code graph (#293)
* feat: turn summarize_code into generator

* feat: extract run_code_graph_pipeline, update the pipeline

* feat: minimal code graph example

* refactor: update argument

* refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline

* refactor: indentation and whitespace nits

* refactor: add deprecated use comments and warnings

---------

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
2024-12-17 12:02:25 +01:00
alekszievr
4f2745504c
Calculate official hotpot EM and F1 scores (#292) 2024-12-10 19:16:12 +01:00
Boris
348610e73c
fix: refactor get_graph_from_model to return nodes and edges correctly (#257)
* fix: handle rate limit error coming from llm model

* fix: fixes lost edges and nodes in get_graph_from_model

* fix: fixes database pruning issue in pgvector (#261)

* fix: cognee_demo notebook pipeline is not saving summaries

---------

Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2024-12-06 12:52:01 +01:00
Boris Arzentar
d49ab4c3b5 feat: update code-graph notebook 2024-12-03 23:48:12 +01:00
Boris Arzentar
b89a4b8054 Merge remote-tracking branch 'origin/main' into code-graph 2024-12-03 21:14:19 +01:00
Rita Aleksziev
a0d5102bd8 add some spaces for readability 2024-12-03 17:22:23 +01:00
Rita Aleksziev
0fbb50960b prompt renaming 2024-12-03 15:59:03 +01:00
Rita Aleksziev
dc082de4c2 minor bugfix in folder creation 2024-12-02 14:54:40 +01:00
Rita Aleksziev
f966f099fc Prompt renaming to more specific names. Minor code changes. 2024-12-02 12:18:00 +01:00
Boris Arzentar
11acabdb6a fix: remove duplicate nodes and edges before saving; Fix FalkorDB vector index; 2024-12-02 10:10:18 +01:00
Rita Aleksziev
a4c56f118d Connect code graph pipeline + retriever + benchmarking 2024-11-29 15:24:49 +01:00
Rita Aleksziev
4da1657140 merge changes from code-graph 2024-11-29 12:16:36 +01:00
Rita Aleksziev
8f241fa6c5 convert edge to string 2024-11-29 12:05:52 +01:00
Leon Luithlen
a5ae9185cd Replicate PR 33 2024-11-29 11:40:51 +01:00