cognee/evals/eval_framework/benchmark_adapters/twowikimultihop_adapter.py
lxobr 4b7c21d7d8
feat: retrieve golden contexts [COG-1364] (#579)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
• Added load_golden_context parameter to BaseBenchmarkAdapter's abstract
load_corpus method, establishing a common interface for retrieving
supporting evidence
• Refactored HotpotQAAdapter with a modular design: introduced
_get_metadata_field_name method to handle dataset-specific fields
(making it extensible for child classes), implemented get golden context
functionality.
• Refactored TwoWikiMultihopAdapter to inherit from HotpotQAAdapter,
overriding only the necessary methods while reusing parent's
functionality
• Added golden context support to MusiqueQAAdapter with their
decomposition-based format
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced an option to include additional context during corpus
loading, enhancing the quality and flexibility of generated QA pairs.
- **Refactor**
- Streamlined and modularized the processing workflow across different
adapters for improved consistency and maintainability.
- Updated metadata extraction to refine the display of contextual
information.
- Shifted focus in the `TwoWikiMultihopAdapter` from corpus loading to
context extraction.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-02-27 13:25:47 +01:00

28 lines
987 B
Python

import requests
import os
import json
import random
from typing import Optional, Any, List, Tuple
from evals.eval_framework.benchmark_adapters.hotpot_qa_adapter import HotpotQAAdapter
class TwoWikiMultihopAdapter(HotpotQAAdapter):
dataset_info = {
"filename": "2wikimultihop_dev.json",
"url": "https://huggingface.co/datasets/voidful/2WikiMultihopQA/resolve/main/dev.json",
}
def __init__(self):
super().__init__()
self.metadata_field_name = "type"
def _get_golden_context(self, item: dict[str, Any]) -> str:
"""Extracts and formats the golden context from supporting facts and adds evidence if available."""
golden_context = super()._get_golden_context(item)
if "evidences" in item:
golden_context += "\nEvidence fact triplets:"
for subject, relation, obj in item["evidences"]:
golden_context += f"\n{subject} - {relation} - {obj}"
return golden_context