scispace - formally typeset
Search or ask a question
Author

Saad Mahamood

Other affiliations: King's College, Aberdeen
Bio: Saad Mahamood is an academic researcher from University of Aberdeen. The author has contributed to research in topics: Natural language generation & Computer science. The author has an hindex of 9, co-authored 19 publications receiving 387 citations. Previous affiliations of Saad Mahamood include King's College, Aberdeen.

Papers
More filters
Journal ArticleDOI
TL;DR: Recent and ongoing work on building systems that automatically generate textual summaries of neonatal data are described, showing that the technology is viable and comparable in its effectiveness for decision support to existing presentation modalities.
Abstract: Contemporary Neonatal Intensive Care Units collect vast amounts of patient data in various formats, making efficient processing of information by medical professionals difficult. Moreover, different stakeholders in the neonatal scenario, which include parents as well as staff occupying different roles, have different information requirements. This paper describes recent and ongoing work on building systems that automatically generate textual summaries of neonatal data. Our evaluation results show that the technology is viable and comparable in its effectiveness for decision support to existing presentation modalities. We discuss the lessons learned so far, as well as the major challenges involved in extending current technology to deal with a broader range of data types, and to improve the textual output in the form of more coherent summaries.

138 citations

Proceedings Article
01 Dec 2020
TL;DR: Due to a pervasive lack of clarity in reports and extreme diversity in approaches, human evaluation in NLG presents as extremely confused in 2020, and that the field is in urgent need of standard methods and terminology.
Abstract: Human assessment remains the most trusted form of evaluation in NLG, but highly diverse approaches and a proliferation of different quality criteria used by researchers make it difficult to compare results and draw conclusions across papers, with adverse implications for meta-evaluation and reproducibility. In this paper, we present (i) our dataset of 165 NLG papers with human evaluations, (ii) the annotation scheme we developed to label the papers for different aspects of evaluations, (iii) quantitative analyses of the annotations, and (iv) a set of recommendations for improving standards in evaluation reporting. We use the annotations as a basis for examining information included in evaluation reports, and levels of consistency in approaches, experimental design and terminology, focusing in particular on the 200+ different terms that have been used for evaluated aspects of quality. We conclude that due to a pervasive lack of clarity in reports and extreme diversity in approaches, human evaluation in NLG presents as extremely confused in 2020, and that the field is in urgent need of standard methods and terminology.

95 citations

Proceedings Article
28 Sep 2011
TL;DR: This paper presents several affective NLG strategies for generating medical texts for parents of pre-term neonates, and shows that all recipients preferred texts generated with the affective strategies, regardless of predicted stress level.
Abstract: This paper presents several affective NLG strategies for generating medical texts for parents of pre-term neonates. Initially, these were meant to be personalised according to a model of the recipient's level of stress. However, our evaluation showed that all recipients preferred texts generated with the affective strategies, regardless of predicted stress level.

55 citations

Proceedings ArticleDOI
01 Sep 2015
TL;DR: A snapshot of endto-end NLG system evaluations as presented in conference and journal papers over the last ten years is presented to better understand the nature and type of evaluations that have been undertaken.
Abstract: In this paper we present a snapshot of endto-end NLG system evaluations as presented in conference and journal papers1 over the last ten years in order to better understand the nature and type of evaluations that have been undertaken. We find that researchers tend to favour specific evaluation methods, and that their evaluation approaches are also correlated with the publication venue. We further discuss what factors may influence the types of evaluation used for a given NLG system.

49 citations

Posted Content
TL;DR: GEM as discussed by the authors is a living benchmark for natural language generation (NLG), its Evaluation and Metrics, which provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested.
Abstract: We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the data for which we are organizing a shared task at our ACL 2021 Workshop and to which we invite the entire NLG community to participate.

44 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A survey of the state of the art in natural language generation can be found in this article, with an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organized.
Abstract: This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past two decades, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artifical intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of nlp, with an emphasis on different evaluation methods and the relationships between them.

562 citations

Proceedings ArticleDOI
10 Sep 2017
TL;DR: A wide range of metrics are investigated, including state-of-the-art word-based and novel grammar-based ones, and it is demonstrated that they only weakly reflect human judgements of system outputs as generated by data-driven, end-to-end NLG.
Abstract: The majority of NLG evaluation relies on automatic metrics, such as BLEU . In this paper, we motivate the need for novel, system- and data-independent automatic evaluation methods: We investigate a wide range of metrics, including state-of-the-art word-based and novel grammar-based ones, and demonstrate that they only weakly reflect human judgements of system outputs as generated by data-driven, end-to-end NLG. We also show that metric performance is data- and system-specific. Nevertheless, our results also suggest that automatic metrics perform reliably at system-level and can support system development by finding cases where a system performs poorly.

421 citations

Journal ArticleDOI
TL;DR: BLOOM as discussed by the authors is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total).
Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

407 citations

Journal ArticleDOI
TL;DR: A prototype, called BT-45, is presented, which generates textual summaries of about 45 minutes of continuous physiological signals and discrete events and brings together techniques from the different areas of signal processing, medical reasoning, knowledge engineering, and natural language generation.

279 citations

01 Jan 2000

243 citations