Home
/
Authors
/
Karmanya Aggarwal

Author

Karmanya Aggarwal

Bio: Karmanya Aggarwal is an academic researcher. The author has contributed to research in topics: Data collection & Benchmark (computing). The author has an hindex of 2, co-authored 4 publications receiving 51 citations.

Papers

PDF

Open Access

More filters

Posted Content•

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics.

[...]

Sebastian Gehrmann¹, Tosin P. Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Aremu Anuoluwapo, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna Clinciu, Dipanjan Das, Kaustubh Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Chinenye Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Mihir Kale, Dhruv Kumar, Faisal Ladhak, Aman Madaan, Mounica Maddela, Khyati Mahajan, Saad Mahamood, Bodhisattwa Prasad Majumder, Pedro Henrique Martins, Angelina McMillan-Major, Simon Mille, Emiel van Miltenburg, Moin Nadeem, Shashi Narayan, Vitaly Nikolaev, Rubungo Andre Niyongabo, Salomey Osei, Ankur P. Parikh, Laura Perez-Beltrachini, Niranjan Ramesh Rao, Vikas Raunak, Juan Diego Rodriguez, Sashank Santhanam, João Sedoc, Thibault Sellam, Samira Shaikh, Anastasia Shimorina, Marco Antonio Sobrevilla Cabezudo, Hendrik Strobelt, Nishant Subramani, Wei Xu, Diyi Yang, Akhila Yerukola, Jiawei Zhou - Show less +52 more•Institutions (1)

Google¹

02 Feb 2021-arXiv: Computation and Language

TL;DR: GEM as discussed by the authors is a living benchmark for natural language generation (NLG), its Evaluation and Metrics, which provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested.

...read moreread less

Abstract: We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the data for which we are organizing a shared task at our ACL 2021 Workshop and to which we invite the entire NLG community to participate.

...read moreread less

44 citations

Proceedings Article•DOI•

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

[...]

Sebastian Gehrmann¹, Tosin P. Adewumi², Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Anuoluwapo Aremu, Antoine Bosselut³, Khyathi Raghavi Chandu⁴, Miruna-Adriana Clinciu⁵, Dipanjan Das¹, Kaustubh Dhole⁶, Wanyu Du⁷, Esin Durmus⁸, Ondřej Dušek⁹, Chris Chinenye Emezue¹⁰, Varun Gangal⁴, Cristina Garbacea¹¹, Tatsunori Hashimoto¹², Yufang Hou¹³, Yacine Jernite¹⁴, Harsh Jhamtani⁴, Yangfeng Ji⁷, Shailza Jolly¹⁵, Mihir Kale¹, Dhruv Kumar¹⁶, Faisal Ladhak⁸, Aman Madaan⁴, Mounica Maddela, Khyati Mahajan¹⁷, Saad Mahamood¹⁸, Bodhisattwa Prasad Majumder¹⁹, Pedro Henrique Martins²⁰, Angelina McMillan-Major, Simon Mille²¹, Emiel van Miltenburg, Moin Nadeem, Shashi Narayan¹, Vitaly Nikolaev¹, Andre Niyongabo Rubungo, Salomey Osei²², Ankur P. Parikh¹, Laura Perez-Beltrachini²³, Niranjan Ramesh Rao, Vikas Raunak⁴, Juan Diego Rodriguez²⁴, Sashank Santhanam¹⁷, João Sedoc¹⁴, Thibault Sellam¹, Samira Shaikh¹⁷, Anastasia Shimorina²⁵, Marco Antonio Sobrevilla Cabezudo²⁶, Hendrik Strobelt¹³, Nishant Subramani²⁷, Wei Xu²⁸, Diyi Yang²⁹, Akhila Yerukola¹², Jiawei Zhou - Show less +52 more•Institutions (29)

Google¹, Luleå University of Technology², Allen Institute for Artificial Intelligence³, Carnegie Mellon University⁴, Heriot-Watt University⁵, Tata Institute of Fundamental Research⁶, University of Virginia⁷, Columbia University⁸, Charles University in Prague⁹, Technische Universität München¹⁰, University of Michigan¹¹, Stanford University¹², IBM¹³, New York University¹⁴, Kaiserslautern University of Technology¹⁵, University of Waterloo¹⁶, University of North Carolina at Charlotte¹⁷, University of Aberdeen¹⁸, University of California, San Diego¹⁹, University of Lisbon²⁰, Pompeu Fabra University²¹, African Institute for Mathematical Sciences²², University of Edinburgh²³, University of Texas at Austin²⁴, Centre national de la recherche scientifique²⁵, University of São Paulo²⁶, Intel²⁷, Georgia Institute of Technology²⁸, Association for Computing Machinery²⁹

02 Feb 2021

...read moreread less

Abstract: We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the data for the 2021 shared task at the associated GEM Workshop.

...read moreread less

26 citations

Posted Content•

Does Putting a Linguist in the Loop Improve NLU Data Collection

[...]

Alicia Parrish¹, William C. Huang¹, Omar Agha, Soo-Hwan Lee, Nikita Nangia¹, Alex Warstadt¹, Karmanya Aggarwal, Emily Allaway², Tal Linzen¹, Samuel R. Bowman¹ - Show less +6 more•Institutions (2)

New York University¹, Columbia University²

15 Apr 2021-arXiv: Computation and Language

TL;DR: In this paper, the authors take natural language inference as a test case and ask whether it is beneficial to put a linguist ''in the loop'' during data collection to dynamically identify and address gaps in the data by introducing novel constraints on the task.

...read moreread less

Abstract: Many crowdsourced NLP datasets contain systematic gaps and biases that are identified only after data collection is complete. Identifying these issues from early data samples during crowdsourcing should make mitigation more efficient, especially when done iteratively. We take natural language inference as a test case and ask whether it is beneficial to put a linguist `in the loop' during data collection to dynamically identify and address gaps in the data by introducing novel constraints on the task. We directly compare three data collection protocols: (i) a baseline protocol, (ii) a linguist-in-the-loop intervention with iteratively-updated constraints on the task, and (iii) an extension of linguist-in-the-loop that provides direct interaction between linguists and crowdworkers via a chatroom. The datasets collected with linguist involvement are more reliably challenging than baseline, without loss of quality. But we see no evidence that using this data in training leads to better out-of-domain model performance, and the addition of a chat platform has no measurable effect on the resulting dataset. We suggest integrating expert analysis \textit{during} data collection so that the expert can dynamically address gaps and biases in the dataset.

...read moreread less

3 citations

Proceedings Article•

Does Putting a Linguist in the Loop Improve NLU Data Collection

[...]

New York University¹, Columbia University²

01 Nov 2021

TL;DR: The authors compare three data collection protocols: (i) a baseline protocol with no linguist involvement, (ii) a linguist-in-the-loop intervention with iteratively-updated constraints on the writing task, and (iii) an extension that adds direct interaction between linguists and crowdworkers via a chatroom.

...read moreread less

Abstract: Many crowdsourced NLP datasets contain systematic artifacts that are identified only after data collection is complete. Earlier identification of these issues should make it easier to create high-quality training and evaluation data. We attempt this by evaluating protocols in which expert linguists work ‘in the loop’ during data collection to identify and address these issues by adjusting task instructions and incentives. Using natural language inference as a test case, we compare three data collection protocols: (i) a baseline protocol with no linguist involvement, (ii) a linguist-in-the-loop intervention with iteratively-updated constraints on the writing task, and (iii) an extension that adds direct interaction between linguists and crowdworkers via a chatroom. We find that linguist involvement does not lead to increased accuracy on out-of-domain test sets compared to baseline, and adding a chatroom has no effect on the data. Linguist involvement does, however, lead to more challenging evaluation data and higher accuracy on some challenge sets, demonstrating the benefits of integrating expert analysis during data collection.

...read moreread less

Cited by

PDF

Open Access

More filters

Posted Content•

Evaluation of Text Generation: A Survey

[...]

Asli Celikyilmaz¹, Elizabeth Clark², Jianfeng Gao³•Institutions (3)

Facebook¹, University of Washington², Microsoft³

26 Jun 2020-arXiv: Computation and Language

TL;DR: This paper surveys evaluation methods of natural language generation (NLG) systems that have been developed in the last few years, with a focus on the evaluation of recently proposed NLG tasks and neural NLG models.

...read moreread less

Abstract: The paper surveys evaluation methods of natural language generation (NLG) systems that have been developed in the last few years We group NLG evaluation methods into three categories: (1) human-centric evaluation metrics, (2) automatic metrics that require no training, and (3) machine-learned metrics For each category, we discuss the progress that has been made and the challenges still being faced, with a focus on the evaluation of recently proposed NLG tasks and neural NLG models We then present two examples for task-specific NLG evaluations for automatic text summarization and long text generation, and conclude the paper by proposing future research directions

...read moreread less

186 citations

Posted Content•

A Survey of Knowledge-Enhanced Text Generation.

[...]

Wenhao Yu¹, Chenguang Zhu², Zaitang Li³, Zhiting Hu, Qingyun Wang⁴, Heng Ji⁴, Meng Jiang¹ - Show less +3 more•Institutions (4)

University of Notre Dame¹, Microsoft², The Chinese University of Hong Kong³, University of Illinois at Urbana–Champaign⁴

09 Oct 2020-arXiv: Computation and Language

TL;DR: A comprehensive review of the research on knowledge-enhanced text generation over the past five years is presented, which includes two parts: (i) general methods and architectures for integrating knowledge into text generation; (ii) specific techniques and applications according to different forms of knowledge data.

...read moreread less

Abstract: The goal of text generation is to make machines express in human language. It is one of the most important yet challenging tasks in natural language processing (NLP). Since 2014, various neural encoder-decoder models pioneered by Seq2Seq have been proposed to achieve the goal by learning to map input text to output text. However, the input text alone often provides limited knowledge to generate the desired output, so the performance of text generation is still far from satisfaction in many real-world scenarios. To address this issue, researchers have considered incorporating various forms of knowledge beyond the input text into the generation models. This research direction is known as knowledge-enhanced text generation. In this survey, we present a comprehensive review of the research on knowledge enhanced text generation over the past five years. The main content includes two parts: (i) general methods and architectures for integrating knowledge into text generation; (ii) specific techniques and applications according to different forms of knowledge data. This survey can have broad audiences, researchers and practitioners, in academia and industry.

...read moreread less

115 citations

Posted Content•

Deep Learning for Text Style Transfer: A Survey

[...]

Di Jin¹, Zhijing Jin², Zhiting Hu³, Olga Vechtomova⁴, Rada Mihalcea⁵ - Show less +1 more•Institutions (5)

Massachusetts Institute of Technology¹, Max Planck Society², University of California, San Diego³, University of Waterloo⁴, University of Michigan⁵

01 Nov 2020-arXiv: Computation and Language

TL;DR: This article presents a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neuralText style transfer work in 2017, as well as the rich methodologies in the presence of parallel and non-parallel data.

...read moreread less

Abstract: Text style transfer (TST) is an important task in natural language generation (NLG), which aims to control certain attributes in the generated text, such as politeness, emotion, humor, and many others. It has a long history in the field of natural language processing (NLP), and recently has re-gained significant attention thanks to the promising performance brought by deep neural models. In this paper, we present a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neural text style transfer work in 2017. We discuss the task formulation, existing datasets and subtasks, evaluation, as well as the rich methodologies in the presence of parallel and non-parallel data. We also provide discussions on a variety of important topics regarding the future development of TST. Our curated paper list is at this https URL

...read moreread less

101 citations

Proceedings Article•DOI•

CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities

[...]

Mina Lee, Percy Liang, Qian Yang

18 Jan 2022

TL;DR: It is argued that by curating and analyzing large interaction datasets, the HCI community can foster more incisive examinations of LMs’ generative capabilities, and presents CoAuthor, a dataset designed for revealing GPT-3’s capabilities in assisting creative and argumentative writing.

...read moreread less

Abstract: Large language models (LMs) offer unprecedented language generation capabilities and exciting opportunities for interaction design. However, their highly context-dependent capabilities are difficult to grasp and are often subjectively interpreted. In this paper, we argue that by curating and analyzing large interaction datasets, the HCI community can foster more incisive examinations of LMs’ generative capabilities. Exemplifying this approach, we present CoAuthor, a dataset designed for revealing GPT-3’s capabilities in assisting creative and argumentative writing. CoAuthor captures rich interactions between 63 writers and four instances of GPT-3 across 1445 writing sessions. We demonstrate that CoAuthor can address questions about GPT-3’s language, ideation, and collaboration capabilities, and reveal its contribution as a writing “collaborator” under various definitions of good collaboration. Finally, we discuss how this work may facilitate a more principled discussion around LMs’ promises and pitfalls in relation to interaction design. The dataset and an interface for replaying the writing sessions are publicly available at https://coauthor.stanford.edu.

...read moreread less

81 citations

Posted Content•

On the Opportunities and Risks of Foundation Models.

[...]

Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ B. Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri S. Chatterji, Annie Chen, Kathleen Creel, Jared Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel¹, Noah D. Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Ahmad Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, Aditi Raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf H. Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Yang Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang - Show less +110 more•Institutions (1)

Stanford University¹

16 Aug 2021-arXiv: Learning

TL;DR: The authors provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e. g.g. model architectures, training procedures, data, systems, security, evaluation, theory) to their applications.

...read moreread less

Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.

...read moreread less

76 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15

Collapse