Home
/
Authors
/
Akhila Yerukola

Author

Akhila Yerukola

Bio: Akhila Yerukola is an academic researcher from Stanford University. The author has contributed to research in topics: Semantic similarity & Language model. The author has an hindex of 4, co-authored 6 publications receiving 155 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Do Massively Pretrained Language Models Make Better Storytellers

[...]

Abigail See¹, Aneesh S. Pappu¹, Rohun Saxena¹, Akhila Yerukola¹, Christopher D. Manning¹ - Show less +1 more•Institutions (1)

Stanford University¹

01 Nov 2019

TL;DR: The authors compare the performance of an extensively pretrained model, OpenAI GPT2-117, to a state-of-the-art neural story generation model (Fan et al., 2018).

...read moreread less

Abstract: Large neural language models trained on massive amounts of text have emerged as a formidable strategy for Natural Language Understanding tasks. However, the strength of these models as Natural Language Generators is less clear. Though anecdotal evidence suggests that these models generate better quality text, there has been no detailed study characterizing their generation abilities. In this work, we compare the performance of an extensively pretrained model, OpenAI GPT2-117 (Radford et al., 2019), to a state-of-the-art neural story generation model (Fan et al., 2018). By evaluating the generated text across a wide variety of automatic metrics, we characterize the ways in which pretrained models do, and do not, make better storytellers. We find that although GPT2-117 conditions more strongly on context, is more sensitive to ordering of events, and uses more unusual words, it is just as likely to produce repetitive and under-diverse text when using likelihood-maximizing decoding algorithms.

...read moreread less

103 citations

Posted Content•

Do Massively Pretrained Language Models Make Better Storytellers

[...]

Abigail See¹, Aneesh S. Pappu¹, Rohun Saxena¹, Akhila Yerukola¹, Christopher D. Manning¹ - Show less +1 more•Institutions (1)

Stanford University¹

24 Sep 2019-arXiv: Computation and Language

TL;DR: It is found that although GPT2-117 conditions more strongly on context, is more sensitive to ordering of events, and uses more unusual words, it is just as likely to produce repetitive and under-diverse text when using likelihood-maximizing decoding algorithms.

...read moreread less

63 citations

Posted Content•

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics.

[...]

Sebastian Gehrmann¹, Tosin P. Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Aremu Anuoluwapo, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna Clinciu, Dipanjan Das, Kaustubh Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Chinenye Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Mihir Kale, Dhruv Kumar, Faisal Ladhak, Aman Madaan, Mounica Maddela, Khyati Mahajan, Saad Mahamood, Bodhisattwa Prasad Majumder, Pedro Henrique Martins, Angelina McMillan-Major, Simon Mille, Emiel van Miltenburg, Moin Nadeem, Shashi Narayan, Vitaly Nikolaev, Rubungo Andre Niyongabo, Salomey Osei, Ankur P. Parikh, Laura Perez-Beltrachini, Niranjan Ramesh Rao, Vikas Raunak, Juan Diego Rodriguez, Sashank Santhanam, João Sedoc, Thibault Sellam, Samira Shaikh, Anastasia Shimorina, Marco Antonio Sobrevilla Cabezudo, Hendrik Strobelt, Nishant Subramani, Wei Xu, Diyi Yang, Akhila Yerukola, Jiawei Zhou - Show less +52 more•Institutions (1)

Google¹

02 Feb 2021-arXiv: Computation and Language

TL;DR: GEM as discussed by the authors is a living benchmark for natural language generation (NLG), its Evaluation and Metrics, which provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested.

...read moreread less

Abstract: We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the data for which we are organizing a shared task at our ACL 2021 Workshop and to which we invite the entire NLG community to participate.

...read moreread less

44 citations

Proceedings Article•DOI•

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

[...]

Sebastian Gehrmann¹, Tosin P. Adewumi², Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Anuoluwapo Aremu, Antoine Bosselut³, Khyathi Raghavi Chandu⁴, Miruna-Adriana Clinciu⁵, Dipanjan Das¹, Kaustubh Dhole⁶, Wanyu Du⁷, Esin Durmus⁸, Ondřej Dušek⁹, Chris Chinenye Emezue¹⁰, Varun Gangal⁴, Cristina Garbacea¹¹, Tatsunori Hashimoto¹², Yufang Hou¹³, Yacine Jernite¹⁴, Harsh Jhamtani⁴, Yangfeng Ji⁷, Shailza Jolly¹⁵, Mihir Kale¹, Dhruv Kumar¹⁶, Faisal Ladhak⁸, Aman Madaan⁴, Mounica Maddela, Khyati Mahajan¹⁷, Saad Mahamood¹⁸, Bodhisattwa Prasad Majumder¹⁹, Pedro Henrique Martins²⁰, Angelina McMillan-Major, Simon Mille²¹, Emiel van Miltenburg, Moin Nadeem, Shashi Narayan¹, Vitaly Nikolaev¹, Andre Niyongabo Rubungo, Salomey Osei²², Ankur P. Parikh¹, Laura Perez-Beltrachini²³, Niranjan Ramesh Rao, Vikas Raunak⁴, Juan Diego Rodriguez²⁴, Sashank Santhanam¹⁷, João Sedoc¹⁴, Thibault Sellam¹, Samira Shaikh¹⁷, Anastasia Shimorina²⁵, Marco Antonio Sobrevilla Cabezudo²⁶, Hendrik Strobelt¹³, Nishant Subramani²⁷, Wei Xu²⁸, Diyi Yang²⁹, Akhila Yerukola¹², Jiawei Zhou - Show less +52 more•Institutions (29)

Google¹, Luleå University of Technology², Allen Institute for Artificial Intelligence³, Carnegie Mellon University⁴, Heriot-Watt University⁵, Tata Institute of Fundamental Research⁶, University of Virginia⁷, Columbia University⁸, Charles University in Prague⁹, Technische Universität München¹⁰, University of Michigan¹¹, Stanford University¹², IBM¹³, New York University¹⁴, Kaiserslautern University of Technology¹⁵, University of Waterloo¹⁶, University of North Carolina at Charlotte¹⁷, University of Aberdeen¹⁸, University of California, San Diego¹⁹, University of Lisbon²⁰, Pompeu Fabra University²¹, African Institute for Mathematical Sciences²², University of Edinburgh²³, University of Texas at Austin²⁴, Centre national de la recherche scientifique²⁵, University of São Paulo²⁶, Intel²⁷, Georgia Institute of Technology²⁸, Association for Computing Machinery²⁹

02 Feb 2021

...read moreread less

Abstract: We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the data for the 2021 shared task at the associated GEM Workshop.

...read moreread less

26 citations

Journal Article•DOI•

COBRA Frames: Contextual Reasoning about Effects and Harms of Offensive Statements

[...]

Xuhui Zhou, Hao Zhu, Akhila Yerukola, Thomas Davidson, Jena D. Hwang, Swabha Swayamdipta, Maarten Sap - Show less +3 more

03 Jun 2023-arXiv.org

TL;DR: The authors introduce COBRA frames, a context-aware formalism for explaining the intents, reactions, and harms of offensive or biased statements grounded in their social and situational context.

...read moreread less

Abstract: Warning: This paper contains content that may be offensive or upsetting. Understanding the harms and offensiveness of statements requires reasoning about the social and situational context in which statements are made. For example, the utterance"your English is very good"may implicitly signal an insult when uttered by a white man to a non-white colleague, but uttered by an ESL teacher to their student would be interpreted as a genuine compliment. Such contextual factors have been largely ignored by previous approaches to toxic language detection. We introduce COBRA frames, the first context-aware formalism for explaining the intents, reactions, and harms of offensive or biased statements grounded in their social and situational context. We create COBRACORPUS, a dataset of 33k potentially offensive statements paired with machine-generated contexts and free-text explanations of offensiveness, implied biases, speaker intents, and listener reactions. To study the contextual dynamics of offensiveness, we train models to generate COBRA explanations, with and without access to the context. We find that explanations by context-agnostic models are significantly worse than by context-aware ones, especially in situations where the context inverts the statement's offensiveness (29% accuracy drop). Our work highlights the importance and feasibility of contextualized NLP by modeling social factors.

...read moreread less

2 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

On Faithfulness and Factuality in Abstractive Summarization

[...]

Joshua Maynez¹, Shashi Narayan¹, Bernd Bohnet¹, Ryan McDonald¹•Institutions (1)

Google¹

02 May 2020

TL;DR: It is found that neural abstractive summarization models are highly prone to hallucinate content that is unfaithful to the input document and textual entailment measures better correlate with faithfulness than standard metrics, potentially leading the way to automatic evaluation metrics as well as training and decoding criteria.

...read moreread less

Abstract: It is well known that the standard likelihood training and approximate decoding objectives in neural text generation models lead to less human-like responses for open-ended tasks such as language modeling and story generation. In this paper we have analyzed limitations of these models for abstractive document summarization and found that these models are highly prone to hallucinate content that is unfaithful to the input document. We conducted a large scale human evaluation of several neural abstractive summarization systems to better understand the types of hallucinations they produce. Our human annotators found substantial amounts of hallucinated content in all model generated summaries. However, our analysis does show that pretrained models are better summarizers not only in terms of raw metrics, i.e., ROUGE, but also in generating faithful and factual summaries as evaluated by humans. Furthermore, we show that textual entailment measures better correlate with faithfulness than standard metrics, potentially leading the way to automatic evaluation metrics as well as training and decoding criteria.

...read moreread less

513 citations

Posted Content•

Evaluation of Text Generation: A Survey

[...]

Asli Celikyilmaz¹, Elizabeth Clark², Jianfeng Gao³•Institutions (3)

Facebook¹, University of Washington², Microsoft³

26 Jun 2020-arXiv: Computation and Language

TL;DR: This paper surveys evaluation methods of natural language generation (NLG) systems that have been developed in the last few years, with a focus on the evaluation of recently proposed NLG tasks and neural NLG models.

...read moreread less

Abstract: The paper surveys evaluation methods of natural language generation (NLG) systems that have been developed in the last few years We group NLG evaluation methods into three categories: (1) human-centric evaluation metrics, (2) automatic metrics that require no training, and (3) machine-learned metrics For each category, we discuss the progress that has been made and the challenges still being faced, with a focus on the evaluation of recently proposed NLG tasks and neural NLG models We then present two examples for task-specific NLG evaluations for automatic text summarization and long text generation, and conclude the paper by proposing future research directions

...read moreread less

186 citations

Proceedings Article•DOI•

Enabling Language Models to Fill in the Blanks.

[...]

Chris Donahue¹, Mina Lee¹, Percy Liang¹•Institutions (1)

Stanford University¹

01 Jul 2020

TL;DR: It is shown that humans have difficulty identifying sentences infilled by the approach, which can enable LMs to infill entire sentences effectively on three different domains: short stories, scientific abstracts, and lyrics.

...read moreread less

Abstract: We present a simple approach for text infilling, the task of predicting missing spans of text at any position in a document. While infilling could enable rich functionality especially for writing assistance tools, more attention has been devoted to language modeling—a special case of infilling where text is predicted at the end of a document. In this paper, we aim to extend the capabilities of language models (LMs) to the more general task of infilling. To this end, we train (or fine tune) off-the-shelf LMs on sequences containing the concatenation of artificially-masked text and the text which was masked. We show that this approach, which we call infilling by language modeling, can enable LMs to infill entire sentences effectively on three different domains: short stories, scientific abstracts, and lyrics. Furthermore, we show that humans have difficulty identifying sentences infilled by our approach as machine-generated in the domain of short stories.

...read moreread less

136 citations

Journal Article•DOI•

A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

[...]

Jian Guan¹, Fei Huang¹, Zhihao Zhao², Xiaoyan Zhu¹, Minlie Huang¹ - Show less +1 more•Institutions (2)

Tsinghua University¹, Beihang University²

01 Apr 2020-Transactions of the Association for Computational Linguistics

TL;DR: This article proposed a knowledge-enhanced pretraining model for commonsense story generation, which utilizes commonsense knowledge from external knowledge bases to generate reasonable stories and employs multi-task learning which combines a discriminative objective to distinguish true and fake stories during tuning.

...read moreread less

Abstract: Story generation, namely generating a reasonable story from a leading context, is an important but challenging task. In spite of the success in modeling ﬂuency and local coherence, existing neural language generation models (e.g., GPT-2) still suffer from repetition, logic conﬂicts, and lack of long-range coherence in generated stories. We conjecture that this is because of the difﬁculty of associating relevant commonsense knowledge, understanding the causal relationships, and planning entities and events with proper temporal order. In this paper, we devise a knowledge-enhanced pretraining model for commonsense story generation. We propose to utilize commonsense knowledge from external knowledge bases to generate reasonable stories. To further capture the causal and temporal dependencies between the sentences in a reasonable story, we employ multi-task learning which combines a discriminative objective to distinguish true and fake stories during ﬁne-tuning. Automatic and manual evaluation shows that our model can generate more reasonable stories than state-of-the-art baselines, particularly in terms of logic and global coherence.

...read moreread less

117 citations

Posted Content•

A Survey of Knowledge-Enhanced Text Generation.

[...]

Wenhao Yu¹, Chenguang Zhu², Zaitang Li³, Zhiting Hu, Qingyun Wang⁴, Heng Ji⁴, Meng Jiang¹ - Show less +3 more•Institutions (4)

University of Notre Dame¹, Microsoft², The Chinese University of Hong Kong³, University of Illinois at Urbana–Champaign⁴

09 Oct 2020-arXiv: Computation and Language

TL;DR: A comprehensive review of the research on knowledge-enhanced text generation over the past five years is presented, which includes two parts: (i) general methods and architectures for integrating knowledge into text generation; (ii) specific techniques and applications according to different forms of knowledge data.

...read moreread less

Abstract: The goal of text generation is to make machines express in human language. It is one of the most important yet challenging tasks in natural language processing (NLP). Since 2014, various neural encoder-decoder models pioneered by Seq2Seq have been proposed to achieve the goal by learning to map input text to output text. However, the input text alone often provides limited knowledge to generate the desired output, so the performance of text generation is still far from satisfaction in many real-world scenarios. To address this issue, researchers have considered incorporating various forms of knowledge beyond the input text into the generation models. This research direction is known as knowledge-enhanced text generation. In this survey, we present a comprehensive review of the research on knowledge enhanced text generation over the past five years. The main content includes two parts: (i) general methods and architectures for integrating knowledge into text generation; (ii) specific techniques and applications according to different forms of knowledge data. This survey can have broad audiences, researchers and practitioners, in academia and industry.

...read moreread less

115 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

Collapse