scispace - formally typeset
Search or ask a question

Showing papers by "Marco Baroni published in 2016"


Proceedings Article
04 Nov 2016
TL;DR: This paper proposed a framework for language learning that relies on multi-agent communication in the context of referential games, where a sender and a receiver see a pair of images and the receiver must rely on this message to identify the target.
Abstract: The current mainstream approach to train natural language systems is to expose them to large amounts of text. This passive learning is problematic if we are interested in developing interactive machines, such as conversational agents. We propose a framework for language learning that relies on multi-agent communication. We study this learning in the context of referential games. In these games, a sender and a receiver see a pair of images. The sender is told one of them is the target and is allowed to send a message from a fixed, arbitrary vocabulary to the receiver. The receiver must rely on this message to identify the target. Thus, the agents develop their own language interactively out of the need to communicate. We show that two networks with simple configurations are able to learn to coordinate in the referential game. We further explore how to make changes to the game environment to cause the "word meanings" induced in the game to better reflect intuitive semantic properties of the images. In addition, we present a simple strategy for grounding the agents' code into natural language. Both of these are necessary steps towards developing machines that are able to communicate with humans productively.

327 citations


Posted Content
TL;DR: It is shown that LAMBADA exemplifies a wide range of linguistic phenomena, and that none of several state-of-the-art language models reaches accuracy above 1% on this novel benchmark.
Abstract: We introduce LAMBADA, a dataset to evaluate the capabilities of computational models for text understanding by means of a word prediction task. LAMBADA is a collection of narrative passages sharing the characteristic that human subjects are able to guess their last word if they are exposed to the whole passage, but not if they only see the last sentence preceding the target word. To succeed on LAMBADA, computational models cannot simply rely on local context, but must be able to keep track of information in the broader discourse. We show that LAMBADA exemplifies a wide range of linguistic phenomena, and that none of several state-of-the-art language models reaches accuracy above 1% on this novel benchmark. We thus propose LAMBADA as a challenging test set, meant to encourage the development of new models capable of genuine understanding of broad context in natural language text.

212 citations


Posted Content
TL;DR: It is shown that two networks with simple configurations are able to learn to coordinate in the referential game and how to make changes to the game environment to cause the "word meanings" induced in the game to better reflect intuitive semantic properties of the images.
Abstract: The current mainstream approach to train natural language systems is to expose them to large amounts of text. This passive learning is problematic if we are interested in developing interactive machines, such as conversational agents. We propose a framework for language learning that relies on multi-agent communication. We study this learning in the context of referential games. In these games, a sender and a receiver see a pair of images. The sender is told one of them is the target and is allowed to send a message from a fixed, arbitrary vocabulary to the receiver. The receiver must rely on this message to identify the target. Thus, the agents develop their own language interactively out of the need to communicate. We show that two networks with simple configurations are able to learn to coordinate in the referential game. We further explore how to make changes to the game environment to cause the "word meanings" induced in the game to better reflect intuitive semantic properties of the images. In addition, we present a simple strategy for grounding the agents' code into natural language. Both of these are necessary steps towards developing machines that are able to communicate with humans productively.

129 citations


Proceedings ArticleDOI
20 Jun 2016
TL;DR: The LAMBADA dataset as discussed by the authors is a collection of narrative passages sharing the characteristic that human subjects are able to guess their last word if they are exposed to the whole passage, but not if they only see the last sentence preceding the target word.
Abstract: We introduce LAMBADA, a dataset to evaluate the capabilities of computational models for text understanding by means of a word prediction task. LAMBADA is a collection of narrative passages sharing the characteristic that human subjects are able to guess their last word if they are exposed to the whole passage, but not if they only see the last sentence preceding the target word. To succeed on LAMBADA, computational models cannot simply rely on local context, but must be able to keep track of information in the broader discourse. We show that LAMBADA exemplifies a wide range of linguistic phenomena, and that none of several state-of-the-art language models reaches accuracy above 1% on this novel benchmark. We thus propose LAMBADA as a challenging test set, meant to encourage the development of new models capable of genuine understanding of broad context in natural language text.

117 citations


Journal ArticleDOI
TL;DR: This article reviews how methods from computer vision are exploited to tackle the fundamental problem of grounding distributional semantic models, bringing them closer to providing a full-fledged computational account of meaning.
Abstract: Distributional semantic models build vector-based word meaning representations on top of contextual information extracted from large collections of text. Object recognition methods from computer vision derive vector-based representations of visual content from natural images. This article reviews how methods from computer vision are exploited to tackle the fundamental problem of grounding distributional semantic models, bringing them closer to providing a full-fledged computational account of meaning.

89 citations


Book ChapterDOI
03 Apr 2016
TL;DR: In this article, a simple environment that could be used to incrementally teach a machine the basics of natural language-based communication, as a prerequisite to more complex interaction with human users, is discussed.
Abstract: The development of intelligent machines is one of the biggest unsolved challenges in computer science. In this paper, we propose some fundamental properties these machines should have, focusing in particular on communication and learning. We discuss a simple environment that could be used to incrementally teach a machine the basics of natural-language-based communication, as a prerequisite to more complex interaction with human users. We also present some conjectures on the sort of algorithms the machine should support in order to profitably learn from the environment.

59 citations


Journal ArticleDOI
01 Mar 2016
TL;DR: Qualitative and quantitative error analyses show that many systems are quite sensitive to changes in the proportion of sentence pair types, and degrade in the presence of additional lexico-syntactic complexities which do not affect human judgements, but the effect needs further confirmation.
Abstract: This paper is an extended description of SemEval-2014 Task 1, the task on the evaluation of Compositional Distributional Semantics Models on full sentences. Systems participating in the task were presented with pairs of sentences and were evaluated on their ability to predict human judgments on (1) semantic relatedness and (2) entailment. Training and testing data were subsets of the SICK (Sentences Involving Compositional Knowledge) data set. SICK was developed with the aim of providing a proper benchmark to evaluate compositional semantic systems, though task participation was open to systems based on any approach. Taking advantage of the SemEval experience, in this paper we analyze the SICK data set, in order to evaluate the extent to which it meets its design goal and to shed light on the linguistic phenomena that are still challenging for state-of-the-art computational semantic systems. Qualitative and quantitative error analyses show that many systems are quite sensitive to changes in the proportion of sentence pair types, and degrade in the presence of additional lexico-syntactic complexities which do not affect human judgements. More compositional systems seem to perform better when the task proportions are changed, but the effect needs further confirmation.

51 citations


Proceedings ArticleDOI
01 Jun 2016
TL;DR: This work presents a distributed word learning model that operates on child-directed speech paired with realistic visual scenes that integrates linguistic and extra-linguistic information, handles referential uncertainty, and correctly learns to associate words with objects, even in cases of limited linguistic exposure.
Abstract: Children learn the meaning of words by being exposed to perceptually rich situations (linguistic discourse, visual scenes, etc). Current computational learning models typically simulate these rich situations through impoverished symbolic approximations. In this work, we present a distributed word learning model that operates on child-directed speech paired with realistic visual scenes. The model integrates linguistic and extra-linguistic information (visual and social cues), handles referential uncertainty, and correctly learns to associate words with objects, even in cases of limited linguistic exposure.

24 citations


Posted Content
TL;DR: An interactive multimodal framework for language learning where learners engage in cooperative referential games starting from a tabula rasa setup, and thus develop their own language from the need to communicate in order to succeed at the game.
Abstract: We propose an interactive multimodal framework for language learning. Instead of being passively exposed to large amounts of natural text, our learners (implemented as feed-forward neural networks) engage in cooperative referential games starting from a tabula rasa setup, and thus develop their own language from the need to communicate in order to succeed at the game. Preliminary experiments provide promising results, but also suggest that it is important to ensure that agents trained in this way do not develop an adhoc communication code only effective for the game they are playing

23 citations


Journal ArticleDOI
TL;DR: Examining the neurobiological correlates of three different distributional statistics in simple 2-syllable nonwords found that repetition accuracy was higher for nonwords in which the frequency of the first syllable was high, and brain responses to distributed statistics were widespread and almost exclusively cortical.

22 citations


Journal ArticleDOI
TL;DR: A large data set of alternative plausibility ratings for conversationally negated nominal predicates is introduced, and it is shown that simple similarity in distributional semantic space provides an excellent fit to subject data.
Abstract: Logical negation is a challenge for distributional semantics, because predicates and their negations tend to occur in very similar contexts, and consequently their distributional vectors are very similar. Indeed, it is not even clear what properties a "negated" distributional vector should possess. However, when linguistic negation is considered in its actual discourse usage, it often performs a role that is quite different from straightforward logical negation. If someone states, in the middle of a conversation, that "This is not a dog," the negation strongly suggests a restricted set of alternative predicates that might hold true of the object being talked about. In particular, other canids and middle-sized mammals are plausible alternatives, birds are less likely, skyscrapers and other large buildings virtually impossible. Conversational negation acts like a graded similarity function, of the sort that distributional semantics might be good at capturing. In this article, we introduce a large data set of alternative plausibility ratings for conversationally negated nominal predicates, and we show that simple similarity in distributional semantic space provides an excellent fit to subject data. On the one hand, this fills a gap in the literature on conversational negation, proposing distributional semantics as the right tool to make explicit predictions about potential alternatives of negated predicates. On the other hand, the results suggest that negation, when addressed from a broader pragmatic perspective, far from being a nuisance, is an ideal application domain for distributional semantic methods.

Journal ArticleDOI
TL;DR: This article shows mathematically that the difference between the PMI dimension of a phrase vector and the sum of PMIs in the corresponding dimensions of the phrase's parts is an independently interpretable value, namely, a quantification of the impact of the context associated with the relevant dimension on thephrase's internal cohesion, as also measured by PMI.
Abstract: Distributional semantic models, deriving vector-based word representations from patterns of word usage in corpora, have many useful applications Turney and Pantel 2010. Recently, there has been interest in compositional distributional models, which derive vectors for phrases from representations of their constituent words Mitchell and Lapata 2010. Often, the values of distributional vectors are pointwise mutual information PMI scores obtained from raw co-occurrence counts. In this article we study the relation between the PMI dimensions of a phrase vector and its components in order to gain insights into which operations an adequate composition model should perform. We show mathematically that the difference between the PMI dimension of a phrase vector and the sum of PMIs in the corresponding dimensions of the phrase's parts is an independently interpretable value, namely, a quantification of the impact of the context associated with the relevant dimension on the phrase's internal cohesion, as also measured by PMI. We then explore this quantity empirically, through an analysis of adjective-noun composition.


Posted Content
TL;DR: A system that, given visual representations of a referent and a context, identifies their discriminative attributes, i.e., properties that distinguish them (has_tail), and despite the lack of direct supervision at the attribute level, the model learns to assign plausible attributes to objects (sofa-has_cushion).
Abstract: As a first step towards agents learning to communicate about their visual environment, we propose a system that, given visual representations of a referent (cat) and a context (sofa), identifies their discriminative attributes, i.e., properties that distinguish them (has_tail). Moreover, despite the lack of direct supervision at the attribute level, the model learns to assign plausible attributes to objects (sofa-has_cushion). Finally, we present a preliminary experiment confirming the referential success of the predicted discriminative attributes.

Book ChapterDOI
TL;DR: This article introduced a neural network model that, given a definite description and a set of objects represented by natural images, points to the intended object if the expression has a unique referent, or indicates a failure, if it does not.
Abstract: One of the most basic functions of language is to refer to objects in a shared scene. Modeling reference with continuous representations is challenging because it requires individuation, i.e., tracking and distinguishing an arbitrary number of referents. We introduce a neural network model that, given a definite description and a set of objects represented by natural images, points to the intended object if the expression has a unique referent, or indicates a failure, if it does not. The model, directly trained on reference acts, is competitive with a pipeline manually engineered to perform the same task, both when referents are purely visual, and when they are characterized by a combination of visual and linguistic properties.