Commit graph

3 commits

Author SHA1 Message Date
lxobr
4b7c21d7d8
feat: retrieve golden contexts [COG-1364] (#579)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
• Added load_golden_context parameter to BaseBenchmarkAdapter's abstract
load_corpus method, establishing a common interface for retrieving
supporting evidence
• Refactored HotpotQAAdapter with a modular design: introduced
_get_metadata_field_name method to handle dataset-specific fields
(making it extensible for child classes), implemented get golden context
functionality.
• Refactored TwoWikiMultihopAdapter to inherit from HotpotQAAdapter,
overriding only the necessary methods while reusing parent's
functionality
• Added golden context support to MusiqueQAAdapter with their
decomposition-based format
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced an option to include additional context during corpus
loading, enhancing the quality and flexibility of generated QA pairs.
- **Refactor**
- Streamlined and modularized the processing workflow across different
adapters for improved consistency and maintainability.
- Updated metadata extraction to refine the display of contextual
information.
- Shifted focus in the `TwoWikiMultihopAdapter` from corpus loading to
context extraction.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-27 13:25:47 +01:00
alekszievr
17231de5d0
Test: Parse context pieces separately in MusiqueQAAdapter and adjust tests [cog-1234] (#561)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Tests**
- Updated evaluation checks by removing assertions related to the
relationship between `corpus_list` and `qa_pairs`, now focusing solely
on `qa_pairs` limits.

- **Refactor**
- Improved content processing to append each paragraph individually to
`corpus_list`, enhancing clarity in data structure.
- Simplified type annotations in the `load_corpus` method across
multiple adapters, ensuring consistency in return types.

- **Chores**
- Updated dependency installation commands in GitHub Actions workflows
for Python 3.10, 3.11, and 3.12 to include additional evaluation-related
dependencies.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
2025-02-20 14:23:53 +01:00
hajdul88
6a0c0e3ef8
feat: Cognee evaluation framework development (#498)
<!-- .github/pull_request_template.md -->

This PR contains the evaluation framework development for cognee

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Expanded evaluation framework now integrates asynchronous corpus
building, question answering, and performance evaluation with adaptive
benchmarks for improved metrics (correctness, exact match, and F1
score).

- **Infrastructure**
- Added database integration for persistent storage of questions,
answers, and metrics.
- Launched an interactive metrics dashboard featuring advanced
visualizations.
- Introduced an automated testing workflow for continuous quality
assurance.

- **Documentation**
  - Updated guidelines for generating concise, clear answers.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-11 16:31:54 +01:00