Anya Belz

Researcher at University of Brighton

Publications - 25

Citations - 227

Anya Belz is an academic researcher from University of Brighton. The author has contributed to research in topics: Computer science & Task (project management). The author has an hindex of 5, co-authored 12 publications receiving 86 citations.

Papers

PDF

Open Access

More filters

Proceedings Article

Twenty Years of Confusion in Human Evaluation : NLG Needs Evaluation Sheets and Standardised Definitions

David M. Howcroft, +9 more

TL;DR: Due to a pervasive lack of clarity in reports and extreme diversity in approaches, human evaluation in NLG presents as extremely confused in 2020, and that the field is in urgent need of standard methods and terminology.

...read moreread less

Proceedings Article

Disentangling the Properties of Human Evaluation Methods: A Classification System to Support Comparability, Meta-Evaluation and Reproducibility Testing

Anya Belz, +2 more

TL;DR: This work proposes a classification system for evaluations based on disentangling what is being evaluated, and how it is evaluated in specific evaluation modes and experimental designs and shows that this approach provides a basis for determining comparability, hence for comparison of evaluations across papers, meta-evaluation experiments, reproducibility testing.

...read moreread less

Proceedings ArticleDOI

A Systematic Review of Reproducibility Research in Natural Language Processing.

Anya Belz, +3 more

TL;DR: The authors provide a wide-angle, and as near as possible complete, snapshot of current work on reproducibility in NLP, with a focus on the NLP field, focusing on how to define, measure, and address the Reproducibility crisis in science.

...read moreread less

Proceedings ArticleDOI

Human Evaluation and Correlation with Automatic Metrics in Consultation Note Generation

Francesco Moramarco, +7 more

TL;DR: An extensive human evaluation study of consultation notes finds that a simple, character-based Levenshtein distance metric performs on par if not better than common model-based metrics like BertScore.

...read moreread less

Posted Content

The Human Evaluation Datasheet 1.0: A Template for Recording Details of Human Evaluation Experiments in NLP.

Anastasia Shimorina, +1 more

- 17 Mar 2021 -

arXiv: Computation and Language

TL;DR: The Human Evaluation Datasheet as mentioned in this paper is a template for recording the details of individual human evaluation experiments in Natural Language Processing (NLP) and is intended to facilitate the recording of properties of human evaluations in sufficient detail, and with sufficient standardisation, to support comparability, meta-evaluation, and reproducibility tests.

...read moreread less