A
Anya Belz
Researcher at University of Brighton
Publications - 25
Citations - 227
Anya Belz is an academic researcher from University of Brighton. The author has contributed to research in topics: Computer science & Task (project management). The author has an hindex of 5, co-authored 12 publications receiving 86 citations.
Papers
More filters
Proceedings Article
Twenty Years of Confusion in Human Evaluation : NLG Needs Evaluation Sheets and Standardised Definitions
David M. Howcroft,Anya Belz,Miruna-Adriana Clinciu,Dimitra Gkatzia,Sadid A. Hasan,Saad Mahamood,Simon Mille,Emiel van Miltenburg,Sashank Santhanam,Verena Rieser +9 more
TL;DR: Due to a pervasive lack of clarity in reports and extreme diversity in approaches, human evaluation in NLG presents as extremely confused in 2020, and that the field is in urgent need of standard methods and terminology.
Proceedings Article
Disentangling the Properties of Human Evaluation Methods: A Classification System to Support Comparability, Meta-Evaluation and Reproducibility Testing
TL;DR: This work proposes a classification system for evaluations based on disentangling what is being evaluated, and how it is evaluated in specific evaluation modes and experimental designs and shows that this approach provides a basis for determining comparability, hence for comparison of evaluations across papers, meta-evaluation experiments, reproducibility testing.
Proceedings ArticleDOI
A Systematic Review of Reproducibility Research in Natural Language Processing.
TL;DR: The authors provide a wide-angle, and as near as possible complete, snapshot of current work on reproducibility in NLP, with a focus on the NLP field, focusing on how to define, measure, and address the Reproducibility crisis in science.
Proceedings ArticleDOI
Human Evaluation and Correlation with Automatic Metrics in Consultation Note Generation
Francesco Moramarco,Alex Papadopoulos Korfiatis,Mark Perera,Damir Jurić,Jack Flann,Ehud Reiter,Anya Belz,Aleksandar Savkov +7 more
TL;DR: An extensive human evaluation study of consultation notes finds that a simple, character-based Levenshtein distance metric performs on par if not better than common model-based metrics like BertScore.
Posted Content
The Human Evaluation Datasheet 1.0: A Template for Recording Details of Human Evaluation Experiments in NLP.
Anastasia Shimorina,Anya Belz +1 more
TL;DR: The Human Evaluation Datasheet as mentioned in this paper is a template for recording the details of individual human evaluation experiments in Natural Language Processing (NLP) and is intended to facilitate the recording of properties of human evaluations in sufficient detail, and with sufficient standardisation, to support comparability, meta-evaluation, and reproducibility tests.