Search or ask a question

Showing papers by "Michael Collins published in 2018"

PDF

Open Access

Posted Content•

Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency

[...]

Zhuang Ma¹, Michael Collins²•Institutions (2)

University of Pennsylvania¹, Columbia University²

06 Sep 2018-arXiv: Computation and Language

TL;DR: It is shown that the ranking-based variant of NCE gives consistent parameter estimates under weaker assumptions than the classification-based method, which is closely related to negative sampling methods, now widely used in NLP.

...read moreread less

Abstract: Noise Contrastive Estimation (NCE) is a powerful parameter estimation method for log-linear models, which avoids calculation of the partition function or its derivatives at each training step, a computationally demanding step in many cases. It is closely related to negative sampling methods, now widely used in NLP. This paper considers NCE-based estimation of conditional models. Conditional models are frequently encountered in practice; however there has not been a rigorous theoretical analysis of NCE in this setting, and we will argue there are subtle but important questions when generalizing NCE to the conditional case. In particular, we analyze two variants of NCE for conditional models: one based on a classification objective, the other based on a ranking objective. We show that the ranking-based variant of NCE gives consistent parameter estimates under weaker assumptions than the classification-based method; we analyze the statistical efficiency of the ranking-based and classification-based variants of NCE; finally we describe experiments on synthetic data and language modeling showing the effectiveness and trade-offs of both methods.

...read moreread less

83 citations

Proceedings Article•DOI•

Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency

[...]

Zhuang Ma¹, Michael Collins²•Institutions (2)

University of Pennsylvania¹, Columbia University²

01 Jan 2018

TL;DR: In this paper, two variants of Noise Contrastive Estimation (NCE) for conditional models were proposed: one based on a classification objective and the other based on ranking objective.

...read moreread less

Abstract: Noise Contrastive Estimation (NCE) is a powerful parameter estimation method for log-linear models, which avoids calculation of the partition function or its derivatives at each training step, a computationally demanding step in many cases It is closely related to negative sampling methods, now widely used in NLP This paper considers NCE-based estimation of conditional models Conditional models are frequently encountered in practice; however there has not been a rigorous theoretical analysis of NCE in this setting, and we will argue there are subtle but important questions when generalizing NCE to the conditional case In particular, we analyze two variants of NCE for conditional models: one based on a classification objective, the other based on a ranking objective We show that the ranking-based variant of NCE gives consistent parameter estimates under weaker assumptions than the classification-based method; we analyze the statistical efficiency of the ranking-based and classification-based variants of NCE; finally we describe experiments on synthetic data and language modeling showing the effectiveness and tradeoffs of both methods

...read moreread less

58 citations

Posted Content•

Improving Span-based Question Answering Systems with Coarsely Labeled Data.

[...]

Hao Cheng, Ming-Wei Chang, Kenton Lee, Ankur P. Parikh, Michael Collins, Kristina Toutanova - Show less +2 more

05 Nov 2018-arXiv: Computation and Language

TL;DR: This work studies approaches to improve fine-grained short answer Question Answering models by integrating coarse- grained data annotated for paragraph-level relevance and shows that coarsely annotated data can bring significant performance gains.

...read moreread less

Abstract: We study approaches to improve fine-grained short answer Question Answering models by integrating coarse-grained data annotated for paragraph-level relevance and show that coarsely annotated data can bring significant performance gains. Experiments demonstrate that the standard multi-task learning approach of sharing representations is not the most effective way to leverage coarse-grained annotations. Instead, we can explicitly model the latent fine-grained short answer variables and optimize the marginal log-likelihood directly or use a newly proposed \emph{posterior distillation} learning objective. Since these latent-variable methods have explicit access to the relationship between the fine and coarse tasks, they result in significantly larger improvements from coarse supervision.

...read moreread less

1 citations