Showing papers on "Commonsense reasoning published in 2019"

PDF

Open Access

Posted Content•

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

[...]

Jiasen Lu¹, Dhruv Batra², Devi Parikh², Stefan Lee²•Institutions (2)

Salesforce.com¹, Georgia Institute of Technology²

06 Aug 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language, is presented, extending the popular BERT architecture to a multi-modal two-stream model, pro-cessing both visual and textual inputs in separate streams that interact through co-attentional transformer layers.

...read moreread less

Abstract: We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language. We extend the popular BERT architecture to a multi-modal two-stream model, pro-cessing both visual and textual inputs in separate streams that interact through co-attentional transformer layers. We pretrain our model through two proxy tasks on the large, automatically collected Conceptual Captions dataset and then transfer it to multiple established vision-and-language tasks -- visual question answering, visual commonsense reasoning, referring expressions, and caption-based image retrieval -- by making only minor additions to the base architecture. We observe significant improvements across tasks compared to existing task-specific models -- achieving state-of-the-art on all four tasks. Our work represents a shift away from learning groundings between vision and language only as part of task training and towards treating visual grounding as a pretrainable and transferable capability.

...read moreread less

1,241 citations

Proceedings Article•

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

[...]

Jiasen Lu¹, Dhruv Batra², Devi Parikh², Stefan Lee²•Institutions (2)

Salesforce.com¹, Georgia Institute of Technology²

06 Aug 2019

TL;DR: The ViLBERT model as mentioned in this paper extends the BERT architecture to a multi-modal two-stream model, processing both visual and textual inputs in separate streams that interact through co-attentional transformer layers.

...read moreread less

Abstract: We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language. We extend the popular BERT architecture to a multi-modal two-stream model, processing both visual and textual inputs in separate streams that interact through co-attentional transformer layers. We pretrain our model through two proxy tasks on the large, automatically collected Conceptual Captions dataset and then transfer it to multiple established vision-and-language tasks -- visual question answering, visual commonsense reasoning, referring expressions, and caption-based image retrieval -- by making only minor additions to the base architecture. We observe significant improvements across tasks compared to existing task-specific models -- achieving state-of-the-art on all four tasks. Our work represents a shift away from learning groundings between vision and language only as part of task training and towards treating visual grounding as a pretrainable and transferable capability.

...read moreread less

1,069 citations

Posted Content•

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

[...]

Weijie Su¹, Xizhou Zhu¹, Yue Cao¹, Bin Li², Lewei Lu², Furu Wei², Jifeng Dai² - Show less +3 more•Institutions (2)

University of Science and Technology of China¹, Microsoft²

22 Aug 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: A new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT), which adopts the simple yet powerful Transformer model as the backbone, and extends it to take both visual and linguistic embedded features as input.

...read moreread less

Abstract: We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT for short). VL-BERT adopts the simple yet powerful Transformer model as the backbone, and extends it to take both visual and linguistic embedded features as input. In it, each element of the input is either of a word from the input sentence, or a region-of-interest (RoI) from the input image. It is designed to fit for most of the visual-linguistic downstream tasks. To better exploit the generic representation, we pre-train VL-BERT on the massive-scale Conceptual Captions dataset, together with text-only corpus. Extensive empirical analysis demonstrates that the pre-training procedure can better align the visual-linguistic clues and benefit the downstream tasks, such as visual commonsense reasoning, visual question answering and referring expression comprehension. It is worth noting that VL-BERT achieved the first place of single model on the leaderboard of the VCR benchmark. Code is released at \url{this https URL}.

...read moreread less

822 citations

Proceedings Article•DOI•

From Recognition to Cognition: Visual Commonsense Reasoning

[...]

Rowan Zellers¹, Yonatan Bisk¹, Ali Farhadi², Yejin Choi¹•Institutions (2)

University of Washington¹, Allen Institute for Artificial Intelligence²

15 Jun 2019

TL;DR: To move towards cognition-level understanding, a new reasoning engine is presented, Recognition to Cognition Networks (R2C), that models the necessary layered inferences for grounding, contextualization, and reasoning.

...read moreread less

Abstract: Visual understanding goes well beyond object recognition. With one glance at an image, we can effortlessly imagine the world beyond the pixels: for instance, we can infer people's actions, goals, and mental states. While this task is easy for humans, it is tremendously difficult for today's vision systems, requiring higher-order cognition and commonsense reasoning about the world. We formalize this task as Visual Commonsense Reasoning. Given a challenging question about an image, a machine must answer correctly and then provide a rationale justifying its answer. Next, we introduce a new dataset, VCR, consisting of 290k multiple choice QA problems derived from 110k movie scenes. The key recipe for generating non-trivial and high-quality problems at scale is Adversarial Matching, a new approach to transform rich annotations into multiple choice questions with minimal bias. Experimental results show that while humans find VCR easy (over 90% accuracy), state-of-the-art vision models struggle (~45%). To move towards cognition-level understanding, we present a new reasoning engine, Recognition to Cognition Networks (R2C), that models the necessary layered inferences for grounding, contextualization, and reasoning. R2C helps narrow the gap between humans and machines (~65%); still, the challenge is far from solved, and we provide analysis that suggests avenues for future work.

...read moreread less

687 citations

Journal Article•DOI•

ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning

[...]

Maarten Sap¹, Ronan Le Bras², Emily Allaway¹, Chandra Bhagavatula², Nicholas Lourie², Hannah Rashkin¹, Brendan Roof², Noah A. Smith¹, Yejin Choi¹ - Show less +5 more•Institutions (2)

University of Washington¹, Allen Institute for Artificial Intelligence²

17 Jul 2019

TL;DR: ATOMIC as discussed by the authors ) is an atlas of everyday commonsense reasoning, organized through 877k textual descriptions of inferential knowledge, organized as typed if-then relations with variables (e.g., "if X pays Y a compliment, then Y will likely return the compliment" ).

...read moreread less

Abstract: We present ATOMIC, an atlas of everyday commonsense reasoning, organized through 877k textual descriptions of inferential knowledge. Compared to existing resources that center around taxonomic knowledge, ATOMIC focuses on inferential knowledge organized as typed if-then relations with variables (e.g., “if X pays Y a compliment, then Y will likely return the compliment”). We propose nine if-then relation types to distinguish causes vs. effects, agents vs. themes, voluntary vs. involuntary events, and actions vs. mental states. By generatively training on the rich inferential knowledge described in ATOMIC, we show that neural models can acquire simple commonsense capabilities and reason about previously unseen events. Experimental results demonstrate that multitask models that incorporate the hierarchical structure of if-then relation types lead to more accurate inference compared to models trained in isolation, as measured by both automatic and human evaluation.

...read moreread less

523 citations

Proceedings Article•DOI•

Social IQa: Commonsense Reasoning about Social Interactions

[...]

Maarten Sap¹, Hannah Rashkin², Derek Chen³, Ronan Le Bras², Yejin Choi¹ - Show less +1 more•Institutions (3)

University of Washington¹, Allen Institute for Artificial Intelligence², Stanford University³

01 Oct 2019

TL;DR: Social IQa as mentioned in this paper is a large-scale benchmark for commonsense reasoning about social situations, which contains 38,000 multiple choice questions for probing emotional and social intelligence in a variety of everyday situations.

...read moreread less

Abstract: We introduce Social IQa, the first large-scale benchmark for commonsense reasoning about social situations. Social IQa contains 38,000 multiple choice questions for probing emotional and social intelligence in a variety of everyday situations (e.g., Q: “Jordan wanted to tell Tracy a secret, so Jordan leaned towards Tracy. Why did Jordan do this?” A: “Make sure no one else could hear”). Through crowdsourcing, we collect commonsense questions along with correct and incorrect answers about social interactions, using a new framework that mitigates stylistic artifacts in incorrect answers by asking workers to provide the right answer to a different but related question. Empirical results show that our benchmark is challenging for existing question-answering models based on pretrained language models, compared to human performance (>20% gap). Notably, we further establish Social IQa as a resource for transfer learning of commonsense knowledge, achieving state-of-the-art performance on multiple commonsense reasoning tasks (Winograd Schemas, COPA).

...read moreread less

388 citations

Proceedings Article•DOI•

Explain Yourself! Leveraging Language Models for Commonsense Reasoning

[...]

Nazneen Fatema Rajani¹, Bryan McCann¹, Caiming Xiong¹, Richard Socher¹•Institutions (1)

Salesforce.com¹

06 Jun 2019

TL;DR: This work collects human explanations for commonsense reasoning in the form of natural language sequences and highlighted annotations in a new dataset called Common Sense Explanations to train language models to automatically generate explanations that can be used during training and inference in a novel Commonsense Auto-Generated Explanation framework.

...read moreread less

Abstract: Deep learning models perform poorly on tasks that require commonsense reasoning, which often necessitates some form of world-knowledge or reasoning over information not immediately present in the input. We collect human explanations for commonsense reasoning in the form of natural language sequences and highlighted annotations in a new dataset called Common Sense Explanations (CoS-E). We use CoS-E to train language models to automatically generate explanations that can be used during training and inference in a novel Commonsense Auto-Generated Explanation (CAGE) framework. CAGE improves the state-of-the-art by 10% on the challenging CommonsenseQA task. We further study commonsense reasoning in DNNs using both human and auto-generated explanations including transfer to out-of-domain tasks. Empirical results indicate that we can effectively leverage language models for commonsense reasoning.

...read moreread less

379 citations

Posted Content•

WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale

[...]

Keisuke Sakaguchi¹, Ronan Le Bras¹, Chandra Bhagavatula¹, Yejin Choi¹•Institutions (1)

Allen Institute for Artificial Intelligence¹

24 Jul 2019-arXiv: Computation and Language

TL;DR: The authors introduced WinoGrande, a large-scale dataset of 44k problems, inspired by the original Winograd Schema Challenge (WSC) design, but adjusted to improve both the scale and the hardness of the dataset.

...read moreread less

Abstract: The Winograd Schema Challenge (WSC) (Levesque, Davis, and Morgenstern 2011), a benchmark for commonsense reasoning, is a set of 273 expert-crafted pronoun resolution problems originally designed to be unsolvable for statistical models that rely on selectional preferences or word associations. However, recent advances in neural language models have already reached around 90% accuracy on variants of WSC. This raises an important question whether these models have truly acquired robust commonsense capabilities or whether they rely on spurious biases in the datasets that lead to an overestimation of the true capabilities of machine commonsense. To investigate this question, we introduce WinoGrande, a large-scale dataset of 44k problems, inspired by the original WSC design, but adjusted to improve both the scale and the hardness of the dataset. The key steps of the dataset construction consist of (1) a carefully designed crowdsourcing procedure, followed by (2) systematic bias reduction using a novel AfLite algorithm that generalizes human-detectable word associations to machine-detectable embedding associations. The best state-of-the-art methods on WinoGrande achieve 59.4-79.1%, which are 15-35% below human performance of 94.0%, depending on the amount of the training data allowed. Furthermore, we establish new state-of-the-art results on five related benchmarks - WSC (90.1%), DPR (93.1%), COPA (90.6%), KnowRef (85.6%), and Winogender (97.1%). These results have dual implications: on one hand, they demonstrate the effectiveness of WinoGrande when used as a resource for transfer learning. On the other hand, they raise a concern that we are likely to be overestimating the true capabilities of machine commonsense across all these benchmarks. We emphasize the importance of algorithmic bias reduction in existing and future benchmarks to mitigate such overestimation.

...read moreread less

366 citations

Proceedings Article•DOI•

KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning

[...]

Bill Yuchen Lin¹, Xinyue Chen¹, Jamin Chen¹, Xiang Ren²•Institutions (2)

University of Southern California¹, Shanghai Jiao Tong University²

04 Sep 2019

TL;DR: In this paper, the authors proposed a textual inference framework for answering commonsense questions, which effectively utilizes external, structured commonsense knowledge graphs to perform explainable inferences, and achieved state-of-the-art performance on the CommonsenseQA dataset.

...read moreread less

Abstract: Commonsense reasoning aims to empower machines with the human ability to make presumptions about ordinary situations in our daily life. In this paper, we propose a textual inference framework for answering commonsense questions, which effectively utilizes external, structured commonsense knowledge graphs to perform explainable inferences. The framework first grounds a question-answer pair from the semantic space to the knowledge-based symbolic space as a schema graph, a related sub-graph of external knowledge graphs. It represents schema graphs with a novel knowledge-aware graph network module named KagNet, and finally scores answers with graph representations. Our model is based on graph convolutional networks and LSTMs, with a hierarchical path-based attention mechanism. The intermediate attention scores make it transparent and interpretable, which thus produce trustworthy inferences. Using ConceptNet as the only external resource for Bert-based models, we achieved state-of-the-art performance on the CommonsenseQA, a large-scale dataset for commonsense reasoning.

...read moreread less

246 citations

Posted Content•

Abductive Commonsense Reasoning

[...]

Chandra Bhagavatula¹, Ronan Le Bras¹, Chaitanya Malaviya¹, Keisuke Sakaguchi¹, Ari Holtzman¹, Hannah Rashkin¹, Doug Downey¹, Scott Wen-tau Yih², Yejin Choi³ - Show less +5 more•Institutions (3)

Allen Institute for Artificial Intelligence¹, University of Washington², Facebook³

15 Aug 2019-arXiv: Computation and Language

TL;DR: This study introduces a challenge dataset, ART, that consists of over 20k commonsense narrative contexts and 200k explanations, and conceptualizes two new tasks -- Abductive NLI: a multiple-choice question answering task for choosing the more likely explanation, and Abduction NLG: a conditional generation task for explaining given observations in natural language.

...read moreread less

Abstract: Abductive reasoning is inference to the most plausible explanation. For example, if Jenny finds her house in a mess when she returns from work, and remembers that she left a window open, she can hypothesize that a thief broke into her house and caused the mess, as the most plausible explanation. While abduction has long been considered to be at the core of how people interpret and read between the lines in natural language (Hobbs et al., 1988), there has been relatively little research in support of abductive natural language inference and generation. We present the first study that investigates the viability of language-based abductive reasoning. We introduce a challenge dataset, ART, that consists of over 20k commonsense narrative contexts and 200k explanations. Based on this dataset, we conceptualize two new tasks -- (i) Abductive NLI: a multiple-choice question answering task for choosing the more likely explanation, and (ii) Abductive NLG: a conditional generation task for explaining given observations in natural language. On Abductive NLI, the best model achieves 68.9% accuracy, well below human performance of 91.4%. On Abductive NLG, the current best language generators struggle even more, as they lack reasoning capabilities that are trivial for humans. Our analysis leads to new insights into the types of reasoning that deep pre-trained language models fail to perform--despite their strong performance on the related but more narrowly defined task of entailment NLI--pointing to interesting avenues for future research.

...read moreread less

226 citations

Posted Content•

Social Bias Frames: Reasoning about Social and Power Implications of Language

[...]

Maarten Sap¹, Saadia Gabriel¹, Lianhui Qin², Dan Jurafsky³, Noah A. Smith⁴, Yejin Choi¹ - Show less +2 more•Institutions (4)

University of Washington¹, Carnegie Mellon University², Stanford University³, Allen Institute for Artificial Intelligence⁴

10 Nov 2019-arXiv: Computation and Language

TL;DR: The authors introduce Social Bias Frames, a new conceptual formalism that aims to model the pragmatic frames in which people project social biases and stereotypes onto others, and use them to recover social bias frames from unstructured text.

...read moreread less

Abstract: Warning: this paper contains content that may be offensive or upsetting. Language has the power to reinforce stereotypes and project social biases onto others. At the core of the challenge is that it is rarely what is stated explicitly, but rather the implied meanings, that frame people's judgments about others. For example, given a statement that "we shouldn't lower our standards to hire more women," most listeners will infer the implicature intended by the speaker -- that "women (candidates) are less qualified." Most semantic formalisms, to date, do not capture such pragmatic implications in which people express social biases and power differentials in language. We introduce Social Bias Frames, a new conceptual formalism that aims to model the pragmatic frames in which people project social biases and stereotypes onto others. In addition, we introduce the Social Bias Inference Corpus to support large-scale modelling and evaluation with 150k structured annotations of social media posts, covering over 34k implications about a thousand demographic groups. We then establish baseline approaches that learn to recover Social Bias Frames from unstructured text. We find that while state-of-the-art neural models are effective at high-level categorization of whether a given statement projects unwanted social bias (80% F1), they are not effective at spelling out more detailed explanations in terms of Social Bias Frames. Our study motivates future work that combines structured pragmatic inference with commonsense reasoning on social implications.

...read moreread less

Posted Content•

Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning.

[...]

Lifu Huang¹, Ronan Le Bras², Chandra Bhagavatula³, Yejin Choi⁴•Institutions (4)

Rensselaer Polytechnic Institute¹, Allen Institute for Artificial Intelligence², University of Washington³, Microsoft⁴

31 Aug 2019-arXiv: Computation and Language

TL;DR: This paper introduces Cosmos QA, a large-scale dataset of 35,600 problems that require commonsense-based reading comprehension, formulated as multiple-choice questions, and proposes a new architecture that improves over the competitive baselines.

...read moreread less

Abstract: Understanding narratives requires reading between the lines, which in turn, requires interpreting the likely causes and effects of events, even when they are not mentioned explicitly. In this paper, we introduce Cosmos QA, a large-scale dataset of 35,600 problems that require commonsense-based reading comprehension, formulated as multiple-choice questions. In stark contrast to most existing reading comprehension datasets where the questions focus on factual and literal understanding of the context paragraph, our dataset focuses on reading between the lines over a diverse collection of people's everyday narratives, asking such questions as "what might be the possible reason of ...?", or "what would have happened if ..." that require reasoning beyond the exact text spans in the context. To establish baseline performances on Cosmos QA, we experiment with several state-of-the-art neural architectures for reading comprehension, and also propose a new architecture that improves over the competitive baselines. Experimental results demonstrate a significant gap between machine (68.4%) and human performance (94%), pointing to avenues for future research on commonsense machine comprehension. Dataset, code and leaderboard is publicly available at this https URL.

...read moreread less

Proceedings Article•DOI•

Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning

[...]

Lifu Huang¹, Ronan Le Bras², Chandra Bhagavatula³, Yejin Choi⁴•Institutions (4)

Rensselaer Polytechnic Institute¹, Allen Institute for Artificial Intelligence², University of Washington³, Microsoft⁴

31 Aug 2019

TL;DR: Cosmos QA as discussed by the authors ) is a large-scale dataset of 35,600 problems that require commonsense-based reading comprehension, formulated as multiple-choice questions, where the questions focus on reading between the lines, which in turn requires interpreting the likely causes and effects of events.

...read moreread less

Abstract: Understanding narratives requires reading between the lines, which in turn, requires interpreting the likely causes and effects of events, even when they are not mentioned explicitly. In this paper, we introduce Cosmos QA, a large-scale dataset of 35,600 problems that require commonsense-based reading comprehension, formulated as multiple-choice questions. In stark contrast to most existing reading comprehension datasets where the questions focus on factual and literal understanding of the context paragraph, our dataset focuses on reading between the lines over a diverse collection of people’s everyday narratives, asking such questions as “what might be the possible reason of ...?", or “what would have happened if ..." that require reasoning beyond the exact text spans in the context. To establish baseline performances on Cosmos QA, we experiment with several state-of-the-art neural architectures for reading comprehension, and also propose a new architecture that improves over the competitive baselines. Experimental results demonstrate a significant gap between machine (68.4%) and human performance (94%), pointing to avenues for future research on commonsense machine comprehension. Dataset, code and leaderboard is publicly available at https://wilburone.github.io/cosmos.

...read moreread less

Journal Article•DOI•

Fuzzy commonsense reasoning for multimodal sentiment analysis

[...]

Iti Chaturvedi¹, Ranjan Satapathy¹, Sandro Cavallari¹, Erik Cambria¹•Institutions (1)

Nanyang Technological University¹

01 Jul 2019-Pattern Recognition Letters

TL;DR: This work uses a fuzzy logic classifier to predict the degree of a particular emotion in AffectiveSpace and uses the combined model of deep convolutional neural networks and fuzzy logic is termed Convolutional Fuzzy Sentiment Classifier.

...read moreread less

Posted Content•

CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning

[...]

Bill Yuchen Lin¹, Wangchunshu Zhou², Ming Shen¹, Pei Zhou¹, Chandra Bhagavatula³, Yejin Choi⁴, Xiang Ren¹ - Show less +3 more•Institutions (4)

University of Southern California¹, Beihang University², Allen Institute for Artificial Intelligence³, Microsoft⁴

09 Nov 2019-arXiv: Computation and Language

TL;DR: A constrained text generation task, CommonGen associated with a benchmark dataset, to explicitly test machines for the ability of generative commonsense reasoning, and demonstrates that the learned generative Commonsense reasoning capability can be transferred to improve downstream tasks such as CommonsenseQA by generating additional context.

...read moreread less

Abstract: Recently, large-scale pre-trained language models have demonstrated impressive performance on several commonsense-reasoning benchmark datasets. However, building machines with commonsense to compose realistically plausible sentences remains challenging. In this paper, we present a constrained text generation task, CommonGen associated with a benchmark dataset, to explicitly test machines for the ability of generative commonsense reasoning. Given a set of common concepts (e.g., {dog, frisbee, catch, throw}); the task is to generate a coherent sentence describing an everyday scenario using these concepts (e.g., "a man throws a frisbee and his dog catches it"). The CommonGen task is challenging because it inherently requires 1) relational reasoning with background commonsense knowledge, and 2) compositional generalization ability to work on unseen concept combinations. Our dataset, constructed through a combination of crowdsourced and existing caption corpora, consists of 79k commonsense descriptions over 35k unique concept-sets. Experiments show that there is a large gap between state-of-the-art text generation models (e.g., T5) and human performance. Furthermore, we demonstrate that the learned generative commonsense reasoning capability can be transferred to improve downstream tasks such as CommonsenseQA by generating additional context.

...read moreread less

Posted Content•

KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning

[...]

Bill Yuchen Lin¹, Xinyue Chen¹, Jamin Chen¹, Xiang Ren²•Institutions (2)

University of Southern California¹, Shanghai Jiao Tong University²

04 Sep 2019-arXiv: Computation and Language

TL;DR: This paper proposes a textual inference framework for answering commonsense questions, which effectively utilizes external, structured commonsense knowledge graphs to perform explainable inferences.

...read moreread less

Proceedings Article•DOI•

Fusion of Detected Objects in Text for Visual Question Answering

[...]

Chris Alberti, Jeffrey Ling, Michael Collins, David Reitter¹•Institutions (1)

Penn State College of Information Sciences and Technology¹

14 Aug 2019

TL;DR: The authors introduced a simple yet powerful neural architecture for data that combines vision and natural language, which leverages referential information binding words to portions of the image in a single unified architecture.

...read moreread less

Abstract: To advance models of multimodal context, we introduce a simple yet powerful neural architecture for data that combines vision and natural language. The “Bounding Boxes in Text Transformer” (B2T2) also leverages referential information binding words to portions of the image in a single unified architecture. B2T2 is highly effective on the Visual Commonsense Reasoning benchmark, achieving a new state-of-the-art with a 25% relative reduction in error rate compared to published baselines and obtaining the best performance to date on the public leaderboard (as of May 22, 2019). A detailed ablation analysis shows that the early integration of the visual features into the text analysis is key to the effectiveness of the new architecture. A reference implementation of our models is provided.

...read moreread less

Proceedings Article•DOI•

A Surprisingly Robust Trick for the Winograd Schema Challenge

[...]

Vid Kocijan¹, Ana-Maria Cretu, Oana-Maria Camburu², Yordan Yordanov², Thomas Lukasiewicz² - Show less +1 more•Institutions (2)

University of Ljubljana¹, University of Oxford²

15 May 2019

TL;DR: This paper shows that the performance of three language models on WSC273 strongly improves when fine-tuned on a similar pronoun disambiguation problem dataset (denoted WSCR), and generates a large unsupervised WSC-like dataset.

...read moreread less

Abstract: The Winograd Schema Challenge (WSC) dataset WSC273 and its inference counterpart WNLI are popular benchmarks for natural language understanding and commonsense reasoning. In this paper, we show that the performance of three language models on WSC273 consistently and robustly improves when fine-tuned on a similar pronoun disambiguation problem dataset (denoted WSCR). We additionally generate a large unsupervised WSC-like dataset. By fine-tuning the BERT language model both on the introduced and on the WSCR dataset, we achieve overall accuracies of 72.5% and 74.7% on WSC273 and WNLI, improving the previous state-of-the-art solutions by 8.8% and 9.6%, respectively. Furthermore, our fine-tuned models are also consistently more accurate on the “complex” subsets of WSC273, introduced by Trichelair et al. (2018).

...read moreread less

Posted Content•

Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training

[...]

Gen Li, Nan Duan, Yuejian Fang, Ming Gong, Daxin Jiang, Ming Zhou - Show less +2 more

16 Aug 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: Unicoder-VL as discussed by the authors learns joint representations of vision and language in a pre-training manner by using a multi-layer Transformer for the cross-modal pretraining.

...read moreread less

Abstract: We propose Unicoder-VL, a universal encoder that aims to learn joint representations of vision and language in a pre-training manner. Borrow ideas from cross-lingual pre-trained models, such as XLM and Unicoder, both visual and linguistic contents are fed into a multi-layer Transformer for the cross-modal pre-training, where three pre-trained tasks are employed, including Masked Language Modeling (MLM), Masked Object Classification (MOC) and Visual-linguistic Matching (VLM). The first two tasks learn context-aware representations for input tokens based on linguistic and visual contents jointly. The last task tries to predict whether an image and a text describe each other. After pretraining on large-scale image-caption pairs, we transfer Unicoder-VL to caption-based image-text retrieval and visual commonsense reasoning, with just one additional output layer. We achieve state-of-the-art or comparable results on both two tasks and show the powerful ability of the cross-modal pre-training.

...read moreread less

Posted Content•

Fusion of Detected Objects in Text for Visual Question Answering

[...]

Chris Alberti, Jeffrey Ling, Michael Collins, David Reitter¹•Institutions (1)

Google¹

14 Aug 2019-arXiv: Computation and Language

TL;DR: A detailed ablation analysis shows that the early integration of the visual features into the text analysis is key to the effectiveness of the new architecture.

...read moreread less

Abstract: To advance models of multimodal context, we introduce a simple yet powerful neural architecture for data that combines vision and natural language. The "Bounding Boxes in Text Transformer" (B2T2) also leverages referential information binding words to portions of the image in a single unified architecture. B2T2 is highly effective on the Visual Commonsense Reasoning benchmark (this https URL), achieving a new state-of-the-art with a 25% relative reduction in error rate compared to published baselines and obtaining the best performance to date on the public leaderboard (as of May 22, 2019). A detailed ablation analysis shows that the early integration of the visual features into the text analysis is key to the effectiveness of the new architecture. A reference implementation of our models is provided (this https URL).

...read moreread less

Posted Content•

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

[...]

Shane Storks, Qiaozi Gao, Joyce Y. Chai

02 Apr 2019

TL;DR: This paper aims to provide an overview of existing tasks and benchmarks, knowledge resources, and learning and inference approaches toward commonsense reasoning for natural language understanding to support a better understanding of the state of the art, its limitations, and future challenges.

...read moreread less

Abstract: Commonsense knowledge and commonsense reasoning are some of the main bottlenecks in machine intelligence. In the NLP community, many benchmark datasets and tasks have been created to address commonsense reasoning for language understanding. These tasks are designed to assess machines' ability to acquire and learn commonsense knowledge in order to reason and understand natural language text. As these tasks become instrumental and a driving force for commonsense research, this paper aims to provide an overview of existing tasks and benchmarks, knowledge resources, and learning and inference approaches toward commonsense reasoning for natural language understanding. Through this, our goal is to support a better understanding of the state of the art, its limitations, and future challenges.

...read moreread less

Posted Content•

FreeLB: Enhanced Adversarial Training for Language Understanding

[...]

Chen Zhu, Yu Cheng, Zhe Gan, Siqi Sun, Tom Goldstein, Jingjing Liu - Show less +2 more

25 Sep 2019

TL;DR: This article proposed a novel adversarial training algorithm, FreeLB, that promotes higher invariance in the embedding space, by adding adversarial perturbations to word embeddings and minimizing the resultant adversarial risk inside different regions around input samples.

...read moreread less

Abstract: Adversarial training, which minimizes the maximal risk for label-preserving input perturbations, has proved to be effective for improving the generalization of language models In this work, we propose a novel adversarial training algorithm, FreeLB, that promotes higher invariance in the embedding space, by adding adversarial perturbations to word embeddings and minimizing the resultant adversarial risk inside different regions around input samples To validate the effectiveness of the proposed approach, we apply it to Transformer-based models for natural language understanding and commonsense reasoning tasks Experiments on the GLUE benchmark show that when applied only to the finetuning stage, it is able to improve the overall test scores of BERT-base model from 783 to 794, and RoBERTa-large model from 885 to 888 In addition, the proposed approach achieves state-of-the-art single-model test accuracies of 8544\% and 6775\% on ARC-Easy and ARC-Challenge Experiments on CommonsenseQA benchmark further demonstrate that FreeLB can be generalized and boost the performance of RoBERTa-large model on other tasks as well Code is available at \url{this https URL

...read moreread less

Book Chapter•DOI•

Improving Question Answering by Commonsense-Based Pre-training

[...]

Wanjun Zhong¹, Duyu Tang², Nan Duan², Ming Zhou², Jiahai Wang¹, Jian Yin¹ - Show less +2 more•Institutions (2)

Sun Yat-sen University¹, Microsoft²

09 Oct 2019

TL;DR: In this paper, the authors propose to pre-train direct and indirect relational functions between concepts, and show that these pre-trained functions could be easily added to existing neural network models.

...read moreread less

Abstract: Although neural network approaches achieve remarkable success on a variety of NLP tasks, many of them struggle to answer questions that require commonsense knowledge. We believe the main reason is the lack of commonsense connections between concepts. To remedy this, we provide a simple and effective method that leverages external commonsense knowledge base such as ConceptNet. We pre-train direct and indirect relational functions between concepts, and show that these pre-trained functions could be easily added to existing neural network models. Results show that incorporating commonsense-based function improves the state-of-the-art on three question answering tasks that require commonsense reasoning. Further analysis shows that our system discovers and leverages useful evidence from an external commonsense knowledge base, which is missing in existing neural network models and help derive the correct answer.

...read moreread less

Posted Content•

Do Neural Language Representations Learn Physical Commonsense

[...]

Maxwell Forbes, Ari Holtzman, Yejin Choi¹•Institutions (1)

University of Washington¹

08 Aug 2019-arXiv: Computation and Language

TL;DR: The authors investigated the extent to which state-of-the-art neural language representations, trained on a vast amount of natural language text, demonstrate physical commonsense reasoning and found that neural language models still only learn associations that are explicitly written down.

...read moreread less

Abstract: Humans understand language based on the rich background knowledge about how the physical world works, which in turn allows us to reason about the physical world through language. In addition to the properties of objects (e.g., boats require fuel) and their affordances, i.e., the actions that are applicable to them (e.g., boats can be driven), we can also reason about if-then inferences between what properties of objects imply the kind of actions that are applicable to them (e.g., that if we can drive something then it likely requires fuel). In this paper, we investigate the extent to which state-of-the-art neural language representations, trained on a vast amount of natural language text, demonstrate physical commonsense reasoning. While recent advancements of neural language models have demonstrated strong performance on various types of natural language inference tasks, our study based on a dataset of over 200k newly collected annotations suggests that neural language representations still only learn associations that are explicitly written down.

...read moreread less

Proceedings Article•DOI•

Improving Neural Story Generation by Targeted Common Sense Grounding.

[...]

Huanru Henry Mao¹, Bodhisattwa Prasad Majumder¹, Julian McAuley¹, Garrison W. Cottrell¹•Institutions (1)

University of California, San Diego¹

01 Nov 2019

TL;DR: A simple multi-task learning scheme to achieve quantitatively better common sense reasoning in language models by leveraging auxiliary training signals from datasets designed to provide common sense grounding.

...read moreread less

Abstract: Stories generated with neural language models have shown promise in grammatical and stylistic consistency. However, the generated stories are still lacking in common sense reasoning, e.g., they often contain sentences deprived of world knowledge. We propose a simple multi-task learning scheme to achieve quantitatively better common sense reasoning in language models by leveraging auxiliary training signals from datasets designed to provide common sense grounding. When combined with our two-stage fine-tuning pipeline, our method achieves improved common sense reasoning and state-of-the-art perplexity on the WritingPrompts (Fan et al., 2018) story generation dataset.

...read moreread less

Journal Article•

Do Neural Language Representations Learn Physical Commonsense

[...]

Maxwell Forbes, Ari Holtzman, Yejin Choi

01 Aug 2019-Cognitive Science

TL;DR: While recent advancements of neural language models have demonstrated strong performance on various types of natural language inference tasks, this study based on a dataset of over 200k newly collected annotations suggests that neural language representations still only learn associations that are explicitly written down.

...read moreread less

Posted Content•

Explain Yourself! Leveraging Language Models for Commonsense Reasoning

[...]

Nazneen Fatema Rajani¹, Bryan McCann¹, Caiming Xiong¹, Richard Socher¹•Institutions (1)

Salesforce.com¹

06 Jun 2019-arXiv: Computation and Language

TL;DR: This article used CoS-E to train language models to automatically generate explanations that can be used during training and inference in a novel commonsense Auto-Generated Explanation (CAGE) framework.

...read moreread less

Proceedings Article•DOI•

CODAH: An adversarially-authored question answering dataset for common sense

[...]

Michael Chen¹, Mike D'Arcy¹, Alisa Liu¹, Jared Fernandez¹, Doug Downey - Show less +1 more•Institutions (1)

Northwestern University¹

08 Apr 2019

TL;DR: CODAH as mentioned in this paper is an adversarially-constructed evaluation dataset for testing commonsense knowledge, where workers are rewarded for submitting questions that models fail to answer both before and after fine-tuning.

...read moreread less

Abstract: Commonsense reasoning is a critical AI capability, but it is difficult to construct challenging datasets that test common sense. Recent neural question answering systems, based on large pre-trained models of language, have already achieved near-human-level performance on commonsense knowledge benchmarks. These systems do not possess human-level common sense, but are able to exploit limitations of the datasets to achieve human-level scores. We introduce the CODAH dataset, an adversarially-constructed evaluation dataset for testing common sense. CODAH forms a challenging extension to the recently-proposed SWAG dataset, which tests commonsense knowledge using sentence-completion questions that describe situations observed in video. To produce a more difficult dataset, we introduce a novel procedure for question acquisition in which workers author questions designed to target weaknesses of state-of-the-art neural question answering systems. Workers are rewarded for submissions that models fail to answer correctly both before and after fine-tuning (in cross-validation). We create 2.8k questions via this procedure and evaluate the performance of multiple state-of-the-art question answering systems on our dataset. We observe a significant gap between human performance, which is 95.3%, and the performance of the best baseline accuracy of 65.3% by the OpenAI GPT model.

...read moreread less

Proceedings Article•DOI•

How Reasonable are Common-Sense Reasoning Tasks: A Case-Study on the Winograd Schema Challenge and SWAG

[...]

Paul Trichelair, Ali Emami¹, Adam Trischler², Kaheer Suleman¹, Jackie Chi Kit Cheung¹ - Show less +1 more•Institutions (2)

McGill University¹, Microsoft²

01 Sep 2019

TL;DR: This paper makes case studies of both benchmarks and design protocols that clarify and qualify the results of previous work by analyzing threats to the validity of previous experimental designs.

...read moreread less

Abstract: Recent studies have significantly improved the state-of-the-art on common-sense reasoning (CSR) benchmarks like the Winograd Schema Challenge (WSC) and SWAG. The question we ask in this paper is whether improved performance on these benchmarks represents genuine progress towards common-sense-enabled systems. We make case studies of both benchmarks and design protocols that clarify and qualify the results of previous work by analyzing threats to the validity of previous experimental designs. Our protocols account for several properties prevalent in common-sense benchmarks including size limitations, structural regularities, and variable instance difficulty.

...read moreread less

Proceedings Article•DOI•

Attention Is (not) All You Need for Commonsense Reasoning

[...]

Tassilo Klein¹, Moin Nabi¹•Institutions (1)

University of Trento¹

01 Jul 2019

TL;DR: This article proposed an attention-guided commonsense reasoning method based on the BERT model, which can be used for tasks such as Pronoun Disambiguation Problem and Winograd Schema Challenge.

...read moreread less

Abstract: The recently introduced BERT model exhibits strong performance on several language understanding benchmarks. In this paper, we describe a simple re-implementation of BERT for commonsense reasoning. We show that the attentions produced by BERT can be directly utilized for tasks such as the Pronoun Disambiguation Problem and Winograd Schema Challenge. Our proposed attention-guided commonsense reasoning method is conceptually simple yet empirically powerful. Experimental analysis on multiple datasets demonstrates that our proposed system performs remarkably well on all cases while outperforming the previously reported state of the art by a margin. While results suggest that BERT seems to implicitly learn to establish complex relationships between entities, solving commonsense reasoning tasks might require more than unsupervised models learned from huge text corpora.

...read moreread less