scispace - formally typeset
A

Anya Belz

Researcher at University of Brighton

Publications -  25
Citations -  227

Anya Belz is an academic researcher from University of Brighton. The author has contributed to research in topics: Computer science & Task (project management). The author has an hindex of 5, co-authored 12 publications receiving 86 citations.

Papers
More filters
Proceedings Article

Twenty Years of Confusion in Human Evaluation : NLG Needs Evaluation Sheets and Standardised Definitions

TL;DR: Due to a pervasive lack of clarity in reports and extreme diversity in approaches, human evaluation in NLG presents as extremely confused in 2020, and that the field is in urgent need of standard methods and terminology.
Proceedings Article

Disentangling the Properties of Human Evaluation Methods: A Classification System to Support Comparability, Meta-Evaluation and Reproducibility Testing

TL;DR: This work proposes a classification system for evaluations based on disentangling what is being evaluated, and how it is evaluated in specific evaluation modes and experimental designs and shows that this approach provides a basis for determining comparability, hence for comparison of evaluations across papers, meta-evaluation experiments, reproducibility testing.
Proceedings ArticleDOI

A Systematic Review of Reproducibility Research in Natural Language Processing.

TL;DR: The authors provide a wide-angle, and as near as possible complete, snapshot of current work on reproducibility in NLP, with a focus on the NLP field, focusing on how to define, measure, and address the Reproducibility crisis in science.
Proceedings ArticleDOI

Human Evaluation and Correlation with Automatic Metrics in Consultation Note Generation

TL;DR: An extensive human evaluation study of consultation notes finds that a simple, character-based Levenshtein distance metric performs on par if not better than common model-based metrics like BertScore.
Posted Content

The Human Evaluation Datasheet 1.0: A Template for Recording Details of Human Evaluation Experiments in NLP.

TL;DR: The Human Evaluation Datasheet as mentioned in this paper is a template for recording the details of individual human evaluation experiments in Natural Language Processing (NLP) and is intended to facilitate the recording of properties of human evaluations in sufficient detail, and with sufficient standardisation, to support comparability, meta-evaluation, and reproducibility tests.