Add comprehensive test dataset with 7 domain-specific Wikipedia documents
(climate, finance, medical, sports) and corresponding test cases in JSON format.
Total of 2292 lines of test data across 8 files for RAG quality evaluation
and end-to-end testing infrastructure.