Home
/
Authors
/
Aman Madaan

Author

Aman Madaan

Bio: Aman Madaan is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Computer science & Defeasible reasoning. The author has an hindex of 5, co-authored 20 publications receiving 182 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Politeness Transfer: A Tag and Generate Approach

[...]

Aman Madaan¹, Amrith Setlur¹, Tanmay Parekh¹, Barnabás Póczos¹, Graham Neubig¹, Yiming Yang¹, Ruslan Salakhutdinov¹, Alan W. Black¹, Shrimai Prabhumoye² - Show less +5 more•Institutions (2)

Carnegie Mellon University¹, Facebook²

29 Apr 2020

TL;DR: This paper introduces a new task of politeness transfer which involves converting non-polite sentences to polite sentences while preserving the meaning, and designs a tag and generate pipeline that identifies stylistic attributes and subsequently generates a sentence in the target style while preserving most of the source content.

...read moreread less

Abstract: This paper introduces a new task of politeness transfer which involves converting non-polite sentences to polite sentences while preserving the meaning. We also provide a dataset of more than 1.39 instances automatically labeled for politeness to encourage benchmark evaluations on this new task. We design a tag and generate pipeline that identifies stylistic attributes and subsequently generates a sentence in the target style while preserving most of the source content. For politeness as well as five other transfer tasks, our model outperforms the state-of-the-art methods on automatic metrics for content preservation, with a comparable or better performance on style transfer accuracy. Additionally, our model surpasses existing methods on human evaluations for grammaticality, meaning preservation and transfer accuracy across all the six style transfer tasks. The data and code is located at https://github.com/tag-and-generate.

...read moreread less

123 citations

Journal Article•DOI•

PAL: Program-aided Language Models

[...]

Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, Graham Neubig - Show less +4 more

18 Nov 2022-arXiv.org

TL;DR: Program-Aided Language Models (PAL) as discussed by the authors uses the LLM to read natural language problems and generate programs as the intermediate reasoning steps, but offloads the solution step to a runtime such as a Python interpreter.

...read moreread less

Abstract: Large language models (LLMs) have recently demonstrated an impressive ability to perform arithmetic and symbolic reasoning tasks, when provided with a few examples at test time ("few-shot prompting"). Much of this success can be attributed to prompting methods such as"chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem. While LLMs seem to be adept at this sort of step-by-step decomposition, LLMs often make logical and arithmetic mistakes in the solution part, even when the problem is decomposed correctly. In this paper, we present Program-Aided Language models (PAL): a novel approach that uses the LLM to read natural language problems and generate programs as the intermediate reasoning steps, but offloads the solution step to a runtime such as a Python interpreter. With PAL, decomposing the natural language problem into runnable steps remains the only learning task for the LLM, while solving is delegated to the interpreter. We demonstrate this synergy between a neural LLM and a symbolic interpreter across 13 mathematical, symbolic, and algorithmic reasoning tasks from BIG-Bench Hard and other benchmarks. In all these natural language reasoning tasks, generating code using an LLM and reasoning using a Python interpreter leads to more accurate results than much larger models. For example, PAL using Codex achieves state-of-the-art few-shot accuracy on the GSM8K benchmark of math word problems, surpassing PaLM-540B which uses chain-of-thought by absolute 15% top-1. Our code and data are publicly available at http://reasonwithpal.com/ .

...read moreread less

105 citations

Journal Article•DOI•

Self-Refine: Iterative Refinement with Self-Feedback

[...]

Aman Madaan, Nita Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Sean Welleck, Bodhisattwa Majumder, Shashank Gupta, Amir Yazdanbakhsh, Peter Clark - Show less +11 more

30 Mar 2023-arXiv.org

TL;DR: Self-Refine as mentioned in this paper uses a single LLM as the generator, refiner, and feedback provider to generate an initial output using an LLM; then, the same LLM provides feedback for its output and uses it to refine itself.

...read moreread less

Abstract: Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement. The main idea is to generate an initial output using an LLMs; then, the same LLMs provides feedback for its output and uses it to refine itself, iteratively. Self-Refine does not require any supervised training data, additional training, or reinforcement learning, and instead uses a single LLM as the generator, refiner, and feedback provider. We evaluate Self-Refine across 7 diverse tasks, ranging from dialog response generation to mathematical reasoning, using state-of-the-art (GPT-3.5, ChatGPT, and GPT-4) LLMs. Across all evaluated tasks, outputs generated with Self-Refine are preferred by humans and automatic metrics over those generated with the same LLM using conventional one-step generation, improving by ~20% absolute on average in task performance. Our work demonstrates that even state-of-the-art LLMs like GPT-4 can be further improved at test time using our simple, standalone approach.

...read moreread less

63 citations

Proceedings Article•

Numerical relation extraction with minimal supervision

[...]

Aman Madaan, Ashish Mittal¹, Ganesh Ramakrishnan², Sunita Sarawagi²•Institutions (2)

IBM¹, Indian Institute of Technology Bombay²

12 Feb 2016

TL;DR: This work designs two extraction systems that require minimal human supervision per relation: NumberRule, a rule based extractor, and NumberTron, a probabilistic graphical model that dramatically outperform MultiR, a state-of-the-art non-numerical IE model, obtaining up to 25 points F-score improvement.

...read moreread less

Abstract: We study a novel task of numerical relation extraction with the goal of extracting relations where one of the arguments is a number or a quantity (e.g., atomic number(Aluminium, 13), inflation rate(India, 10.9%)). This task presents peculiar challenges not found in standard Information Extraction (IE), such as the difficulty of matching numbers in distant supervision and the importance of units. We design two extraction systems that require minimal human supervision per relation: (1) NumberRule, a rule based extractor, and (2) NumberTron, a probabilistic graphical model. We find that both systems dramatically outperform MultiR, a state-of-the-art non-numerical IE model, obtaining up to 25 points F-score improvement.

...read moreread less

52 citations

Proceedings Article•DOI•

Language Models of Code are Few-Shot Commonsense Learners

[...]

Aman Madaan, Shuyan Zhou, Uri Alon, Yiming Yang, Graham Neubig - Show less +1 more

13 Oct 2022

TL;DR: This paper shows that when this task is frame as code generation tasks, pre-trained LMs of code are better structured commonsense reasoners than L Ms of natural language, even when the downstream task does not involve source code at all.

...read moreread less

Abstract: We address the general task of structured commonsense reasoning: given a natural language input, the goal is to generate a graph such as an event or a reasoning-graph.To employ large language models (LMs) for this task, existing approaches ‘serialize’ the output graph as a flat list of nodes and edges.Although feasible, these serialized graphs strongly deviate from the natural language corpora that LMs were pre-trained on, hindering LMs from generating them correctly. In this paper, we show that when we instead frame structured commonsense reasoning tasks as code generation tasks, pre-trained LMs of code are better structured commonsense reasoners than LMs of natural language, even when the downstream task does not involve source code at all.We demonstrate our approach across three diverse structured commonsense reasoning tasks. In all these natural language tasks, we show that using our approach, a code generation LM (codex) outperforms natural-LMs that are fine-tuned on the target task (T5) and other strong LMs such as GPT-3 in the few-shot setting.

...read moreread less

49 citations

1
2
3
4
…
5
6
7

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Training language models to follow instructions with human feedback

[...]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke E. Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, Ryan Lowe - Show less +16 more

04 Mar 2022

TL;DR: The results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent and showing improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets.

...read moreread less

Abstract: Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.

...read moreread less

1,704 citations

Politeness: some universals in language usage

[...]

María Cristobalina Moreno González

01 Jan 1995

861 citations

Journal Article•DOI•

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

[...]

Teven Le Scao, Angela Fan, Christopher Akiki, Elizabeth-Jane Pavlick +383 more

09 Nov 2022-arXiv.org

TL;DR: BLOOM as discussed by the authors is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total).

...read moreread less

Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

...read moreread less

407 citations

Proceedings Article•

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

[...]

Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, M. Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer - Show less +3 more

25 Feb 2022

TL;DR: This paper shows that ground truth demonstrations are in fact not required and that other aspects of the demonstrations are the key drivers of end task performance, including the fact that they provide a few examples of the label space, the distribution of the input text, and the overall format of the sequence.

...read moreread less

Abstract: Large language models (LMs) are able to in-context learn—perform a new task via inference alone by conditioning on a few input-label pairs (demonstrations) and making predictions for new inputs. However, there has been little understanding of how the model learns and which aspects of the demonstrations contribute to end task performance. In this paper, we show that ground truth demonstrations are in fact not required—randomly replacing labels in the demonstrations barely hurts performance on a range of classification and multi-choce tasks, consistently over 12 different models including GPT-3. Instead, we find that other aspects of the demonstrations are the key drivers of endtask performance, including the fact that they provide a few examples of (1) the label space, (2) the distribution of the input text, and (3) the overall format of the sequence. Together, our analysis provides a new way of understanding how and why in-context learning works, while opening up new questions about how much can be learned from large language models through inference alone.

...read moreread less

299 citations

Posted Content•

Evaluation of Text Generation: A Survey

[...]

Asli Celikyilmaz¹, Elizabeth Clark², Jianfeng Gao³•Institutions (3)

Facebook¹, University of Washington², Microsoft³

26 Jun 2020-arXiv: Computation and Language

TL;DR: This paper surveys evaluation methods of natural language generation (NLG) systems that have been developed in the last few years, with a focus on the evaluation of recently proposed NLG tasks and neural NLG models.

...read moreread less

Abstract: The paper surveys evaluation methods of natural language generation (NLG) systems that have been developed in the last few years We group NLG evaluation methods into three categories: (1) human-centric evaluation metrics, (2) automatic metrics that require no training, and (3) machine-learned metrics For each category, we discuss the progress that has been made and the challenges still being faced, with a focus on the evaluation of recently proposed NLG tasks and neural NLG models We then present two examples for task-specific NLG evaluations for automatic text summarization and long text generation, and conclude the paper by proposing future research directions

...read moreread less

186 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109

Collapse