scispace - formally typeset
Search or ask a question
Author

Marco Baroni

Bio: Marco Baroni is an academic researcher from Facebook. The author has contributed to research in topics: Distributional semantics & Semantic similarity. The author has an hindex of 58, co-authored 227 publications receiving 15594 citations. Previous affiliations of Marco Baroni include Austrian Research Institute for Artificial Intelligence & Catalan Institution for Research and Advanced Studies.


Papers
More filters
Proceedings ArticleDOI
01 Jun 2014
TL;DR: An extensive evaluation of context-predicting models with classic, count-vector-based distributional semantic approaches, on a wide range of lexical semantics tasks and across many parameter settings shows that the buzz around these models is fully justified.
Abstract: Context-predicting models (more commonly known as embeddings or neural language models) are the new kids on the distributional semantics block. Despite the buzz surrounding these models, the literature is still lacking a systematic comparison of the predictive models with classic, count-vector-based distributional semantic approaches. In this paper, we perform such an extensive evaluation, on a wide range of lexical semantics tasks and across many parameter settings. The results, to our own surprise, show that the buzz is fully justified, as the context-predicting models obtain a thorough and resounding victory against their count-based counterparts.

1,405 citations

Journal ArticleDOI
10 Feb 2009
TL;DR: UkWaC, deWaC and itWaC are introduced, three very large corpora of English, German, and Italian built by web crawling, and the methodology and tools used in their construction are described.
Abstract: This article introduces ukWaC, deWaC and itWaC, three very large corpora of English, German, and Italian built by web crawling, and describes the methodology and tools used in their construction. The corpora contain more than a billion words each, and are thus among the largest resources for the respective languages. The paper also provides an evaluation of their suitability for linguistic research, focusing on ukWaC and itWaC. A comparison in terms of lexical coverage with existing resources for the languages of interest produces encouraging results. Qualitative evaluation of ukWaC versus the British National Corpus was also conducted, so as to highlight differences in corpus composition (text types and subject matters). The article concludes with practical information about format and availability of corpora and tools.

1,132 citations

Journal ArticleDOI
TL;DR: This work proposes a flexible architecture to integrate text- and image-based distributional information, and shows in a set of empirical tests that the integrated model is superior to the purely text-based approach, and it provides somewhat complementary semantic information with respect to the latter.
Abstract: Distributional semantic models derive computational representations of word meaning from the patterns of co-occurrence of words in text. Such models have been a success story of computational linguistics, being able to provide reliable estimates of semantic relatedness for the many semantic tasks requiring them. However, distributional models extract meaning information exclusively from text, which is an extremely impoverished basis compared to the rich perceptual sources that ground human semantic knowledge. We address the lack of perceptual grounding of distributional models by exploiting computer vision techniques that automatically identify discrete "visual words" in images, so that the distributional representation of a word can be extended to also encompass its co-occurrence with the visual words of images it is associated with. We propose a flexible architecture to integrate text- and image-based distributional information, and we show in a set of empirical tests that our integrated model is superior to the purely text-based approach, and it provides somewhat complementary semantic information with respect to the latter.

900 citations

Proceedings Article
01 May 2014
TL;DR: This work aims to help the research community working on compositional distributional semantic models (CDSMs) by providing SICK (Sentences Involving Compositional Knowldedge), a large size English benchmark tailored for them.
Abstract: Shared and internationally recognized benchmarks are fundamental for the development of any computational system. We aim to help the research community working on compositional distributional semantic models (CDSMs) by providing SICK (Sentences Involving Compositional Knowldedge), a large size English benchmark tailored for them. SICK consists of about 10,000 English sentence pairs that include many examples of the lexical, syntactic and semantic phenomena that CDSMs are expected to account for, but do not require dealing with other aspects of existing sentential data sets (idiomatic multiword expressions, named entities, telegraphic language) that are not within the scope of CDSMs. By means of crowdsourcing techniques, each pair was annotated for two crucial semantic tasks: relatedness in meaning (with a 5-point rating scale as gold score) and entailment relation between the two elements (with three possible gold labels: entailment, contradiction, and neutral). The SICK data set was used in SemEval-2014 Task 1, and it freely available for research purposes.

732 citations

Journal ArticleDOI
TL;DR: The Distributional Memory approach is shown to be tenable despite the constraints imposed by its multi-purpose nature, and performs competitively against task-specific algorithms recently reported in the literature for the same tasks, and against several state-of-the-art methods.
Abstract: Research into corpus-based semantics has focused on the development of ad hoc models that treat single tasks, or sets of closely related tasks, as unrelated challenges to be tackled by extracting different kinds of distributional information from the corpus. As an alternative to this "one task, one model" approach, the Distributional Memory framework extracts distributional information once and for all from the corpus, in the form of a set of weighted word-link-word tuples arranged into a third-order tensor. Different matrices are then generated from the tensor, and their rows and columns constitute natural spaces to deal with different semantic problems. In this way, the same distributional information can be shared across tasks such as modeling word similarity judgments, discovering synonyms, concept categorization, predicting selectional preferences of verbs, solving analogy problems, classifying relations between word pairs, harvesting qualia structures with patterns or example pairs, predicting the typical properties of concepts, and classifying verbs into alternation classes. Extensive empirical testing in all these domains shows that a Distributional Memory implementation performs competitively against task-specific algorithms recently reported in the literature for the same tasks, and against our implementations of several state-of-the-art methods. The Distributional Memory approach is thus shown to be tenable despite the constraints imposed by its multi-purpose nature.

671 citations


Cited by
More filters
Proceedings ArticleDOI
01 Oct 2014
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Abstract: Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.

30,558 citations

Proceedings Article
28 May 2020
TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
Abstract: Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

10,132 citations

Journal ArticleDOI
TL;DR: This paper proposed a new approach based on skip-gram model, where each word is represented as a bag of character n-grams, words being represented as the sum of these representations, allowing to train models on large corpora quickly and allowing to compute word representations for words that did not appear in the training data.
Abstract: Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models to learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character n-grams. A vector representation is associated to each character n-gram, words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpora quickly and allows to compute word representations for words that did not appear in the training data. We evaluate our word representations on nine different languages, both on word similarity and analogy tasks. By comparing to recently proposed morphological word representations, we show that our vectors achieve state-of-the-art performance on these tasks.

7,537 citations

Book
08 Jul 2008
TL;DR: This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems and focuses on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis.
Abstract: An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.

7,452 citations

01 Jan 2009

7,241 citations