Creating Training Corpora for NLG Micro-Planners

doi:10.18653/V1/P17-1017

Open AccessProceedings ArticleDOI

Creating Training Corpora for NLG Micro-Planners

- Vol. 1, pp 179-188

TLDR

This paper proposes the corpus generation framework as a novel method for creating challenging data sets from which NLG models can be learned which are capable of handling the complex interactions occurring during in micro-planning between lexicalisation, aggregation, surface realisation, referring expression generation and sentence segmentation.

Abstract:

In this paper, we present a novel framework for semi-automatically creating linguistically challenging micro-planning data-to-text corpora from existing Knowledge Bases. Because our method pairs data of varying size and shape with texts ranging from simple clauses to short texts, a dataset created using this framework provides a challenging benchmark for microplanning. Another feature of this framework is that it can be applied to any large scale knowledge base and can therefore be used to train and learn KB verbalisers. We apply our framework to DBpedia data and compare the resulting dataset with Wen et al. 2016’s. We show that while Wen et al.’s dataset is more than twice larger than ours, it is less diverse both in terms of input and in terms of text. We thus propose our corpus generation framework as a novel method for creating challenging data sets from which NLG models can be learned which are capable of handling the complex interactions occurring during in micro-planning between lexicalisation, aggregation, surface realisation, referring expression generation and sentence segmentation. To encourage researchers to take up this challenge, we made available a dataset of 21,855 data/text pairs created using this framework in the context of the WebNLG shared task.

Citations

PDF

Open Access

More filters

Journal Article

PaLM: Scaling Language Modeling with Pathways

Aakanksha Chowdhery, +66 more

- 05 Apr 2022 -

arXiv.org

TL;DR: A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.

...read moreread less

Proceedings ArticleDOI

The WebNLG Challenge: Generating Text from RDF Data

Claire Gardent, +3 more

TL;DR: The microplanning task is introduced, data preparation, evaluation methodology, participant results and a brief description of the participating systems are provided.

...read moreread less

Journal ArticleDOI

Survey of Hallucination in Natural Language Generation

Ziwei Ji, +10 more

- 08 Feb 2022 -

ACM Computing Surveys

TL;DR: This survey serves tofacilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG by providing a broad overview of the research progress and challenges in the hallucination problem inNLG.

...read moreread less

Proceedings ArticleDOI

Extracting Relational Facts by an End-to-End Neural Model with Copy Mechanism

Xiangrong Zeng, +4 more

TL;DR: This paper proposes an end-to-end model based on sequence- to-sequence learning with copy mechanism, which can jointly extract relational facts from sentences of any of these classes, including Normal, EntityPairOverlap and SingleEntiyOverlap.

...read moreread less

Proceedings ArticleDOI

GraphRel: Modeling Text as Relational Graphs for Joint Entity and Relation Extraction

Tsu-Jui Fu, +2 more

TL;DR: GraphRel, an end-to-end relation extraction model which uses graph convolutional networks (GCNs) to jointly learn named entities and relations, outperforms previous work by 3.2% and 5.8% and achieves a new state-of-the-art for relation extraction.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

Proceedings ArticleDOI

The Stanford CoreNLP Natural Language Processing Toolkit

Christopher D. Manning, +5 more

TL;DR: The design and use of the Stanford CoreNLP toolkit is described, an extensible pipeline that provides core natural language analysis, and it is suggested that this follows from a simple, approachable design, straightforward interfaces, the inclusion of robust and good quality analysis components, and not requiring use of a large amount of associated baggage.

...read moreread less

Proceedings Article

SRILM – An Extensible Language Modeling Toolkit

Andreas Stolcke

TL;DR: The functionality of the SRILM toolkit is summarized and its design and implementation is discussed, highlighting ease of rapid prototyping, reusability, and combinability of tools.

...read moreread less

Proceedings Article

Grammar as a foreign language

Oriol Vinyals, +5 more

TL;DR: The domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used syntactic constituency parsing dataset, when trained on a large synthetic corpus that was annotated using existing parsers.

...read moreread less

Proceedings ArticleDOI

Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems

Tsung-Hsien Wen, +5 more

TL;DR: A statistical language generator based on a semantically controlled Long Short-term Memory (LSTM) structure that can learn from unaligned data by jointly optimising sentence planning and surface realisation using a simple cross entropy training criterion, and language variation can be easily achieved by sampling from output candidates.

...read moreread less

Creating Training Corpora for NLG Micro-Planners

Citations

PaLM: Scaling Language Modeling with Pathways

The WebNLG Challenge: Generating Text from RDF Data

Survey of Hallucination in Natural Language Generation

Extracting Relational Facts by an End-to-End Neural Model with Copy Mechanism

GraphRel: Modeling Text as Relational Graphs for Joint Entity and Relation Extraction

References

Bleu: a Method for Automatic Evaluation of Machine Translation

The Stanford CoreNLP Natural Language Processing Toolkit

SRILM – An Extensible Language Modeling Toolkit

Grammar as a foreign language

Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems

Related Papers (5)

Bleu: a Method for Automatic Evaluation of Machine Translation

Modeling relations and their mentions without labeled text

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Adam: A Method for Stochastic Optimization

Attention is All you Need