Showing papers by "Michael Collins published in 2007"

PDF

Open Access

Journal Article•DOI•

[...]

Ariadna Quattoni¹, Sy Bor Wang¹, Louis-Philippe Morency¹, Michael Collins¹, Trevor Darrell¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

01 Oct 2007-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A discriminative latent variable model for classification problems in structured domains where inputs can be represented by a graph of local observations and a hidden-state conditional random field framework learns a set of latent variables conditioned on local features.

...read moreread less

Abstract: We present a discriminative latent variable model for classification problems in structured domains where inputs can be represented by a graph of local observations. A hidden-state conditional random field framework learns a set of latent variables conditioned on local features. Observations need not be independent and may overlap in space and time.

...read moreread less

578 citations

Proceedings Article•

Online Learning of Relaxed CCG Grammars for Parsing to Logical Form

[...]

Luke Zettlemoyer¹, Michael Collins¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jun 2007

TL;DR: A key idea is to introduce non-standard CCG combinators that relax certain parts of the grammar—for example allowing flexible word order, or insertion of lexical items— with learned costs.

...read moreread less

Abstract: We consider the problem of learning to parse sentences to lambda-calculus representations of their underlying semantics and present an algorithm that learns a weighted combinatory categorial grammar (CCG). A key idea is to introduce non-standard CCG combinators that relax certain parts of the grammar—for example allowing flexible word order, or insertion of lexical items— with learned costs. We also present a new, online algorithm for inducing a weighted CCG. Results for the approach on ATIS data show 86% F-measure in recovering fully correct semantic analyses and 95.9% F-measure by a partial-match criterion, a more than 5% improvement over the 90.3% partial-match figure reported by He and Young (2006).

...read moreread less

490 citations

Proceedings Article•

Chinese Syntactic Reordering for Statistical Machine Translation

[...]

Chao Wang¹, Michael Collins¹, Philipp Koehn•Institutions (1)

Massachusetts Institute of Technology¹

01 Jun 2007

TL;DR: A set of syntactic reordering rules that exploit systematic differences between Chinese and English word order are described, which are used as a preprocessor for both training and test sentences, transforming Chinese sentences to be much closer to English in terms of their word order.

...read moreread less

Abstract: Syntactic reordering approaches are an effective method for handling word-order differences between source and target languages in statistical machine translation (SMT) systems. This paper introduces a reordering approach for translation from Chinese to English. We describe a set of syntactic reordering rules that exploit systematic differences between Chinese and English word order. The resulting system is used as a preprocessor for both training and test sentences, transforming Chinese sentences to be much closer to English in terms of their word order. We evaluated the reordering approach within the MOSES phrase-based SMT system (Koehn et al., 2007). The reordering approach improved the BLEU score for the MOSES system from 28.52 to 30.86 on the NIST 2006 evaluation data. We also conducted a series of experiments to analyze the accuracy and impact of different types of reordering rules.

...read moreread less

240 citations

Journal Article•DOI•

Discriminative n-gram language modeling

[...]

Brian Roark¹, Murat Saraclar², Michael Collins³•Institutions (3)

Oregon Health & Science University¹, Boğaziçi University², Massachusetts Institute of Technology³

01 Apr 2007-Computer Speech & Language

TL;DR: This paper describes a method based on regularized likelihood that makes use of the feature set given by the perceptron algorithm, and initialization with the perceptRON's weights; this method gives an additional 0.5% reduction in word error rate (WER) over training withThe perceptron alone.

...read moreread less

215 citations

Proceedings Article•

Structured Prediction Models via the Matrix-Tree Theorem

[...]

Terry Koo¹, Amir Globerson¹, Xavier Carreras¹, Michael Collins¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jun 2007

TL;DR: It is shown how partition functions and marginals for directed spanning trees can be computed by an adaptation of Kirchhoff’s Matrix-Tree Theorem by using the algorithm in training both log-linear and max-margin dependency parsers.

...read moreread less

Abstract: This paper provides an algorithmic framework for learning statistical models involving directed spanning trees, or equivalently non-projective dependency structures We show how partition functions and marginals for directed spanning trees can be computed by an adaptation of Kirchhoff’s Matrix-Tree Theorem To demonstrate an application of the method, we perform experiments which use the algorithm in training both log-linear and max-margin dependency parsers The new training methods give improvements in accuracy over perceptron-trained models

...read moreread less

164 citations

Proceedings Article•DOI•

Learning Visual Representations using Images with Captions

[...]

Ariadna Quattoni¹, Michael Collins¹, Trevor Darrell¹•Institutions (1)

Massachusetts Institute of Technology¹

17 Jun 2007

TL;DR: This paper describes a method for learning representations from large quantities of unlabeled images which have associated captions, which significantly outperforms a fully-supervised baseline model and a model that ignores the captions and learns a visual representation by performing PCA on the unlabeling images alone.

...read moreread less

Abstract: Current methods for learning visual categories work well when a large amount of labeled data is available, but can run into severe difficulties when the number of labeled examples is small. When labeled data is scarce it may be beneficial to use unlabeled data to learn an image representation that is low-dimensional, but nevertheless captures the information required to discriminate between image categories. This paper describes a method for learning representations from large quantities of unlabeled images which have associated captions; the goal is to improve learning in future image classification problems. Experiments show that our method significantly outperforms (1) a fully-supervised baseline model, (2) a model that ignores the captions and learns a visual representation by performing PCA on the unlabeled images alone and (3) a model that uses the output of word classifiers trained using captions and unlabeled data. Our current work concentrates on captions as the source of meta-data, but more generally other types of meta-data could be used.

...read moreread less

111 citations

Proceedings Article•DOI•

Exponentiated gradient algorithms for log-linear structured prediction

[...]

Amir Globerson, Terry Koo, Xavier Carreras, Michael Collins

20 Jun 2007

TL;DR: This paper describes an exponentiated gradient (EG) algorithm for trainingitional log-linear models, which results in both sequential and parallel update algorithms, and provides a convergence proof for both algorithms.

...read moreread less

Abstract: Conditional log-linear models are a commonly used method for structured prediction. Efficient learning of parameters in these models is therefore an important problem. This paper describes an exponentiated gradient (EG) algorithm for training such models. EG is applied to the convex dual of the maximum likelihood objective; this results in both sequential and parallel update algorithms, where in the sequential algorithm parameters are updated in an online fashion. We provide a convergence proof for both algorithms. Our analysis also simplifies previous results on EG for max-margin models, and leads to a tighter bound on convergence rates. Experiments on a large-scale parsing task show that the proposed algorithm converges much faster than conjugate-gradient and L-BFGS approaches both in terms of optimization objective and test error.

...read moreread less

53 citations

Proceedings Article•DOI•

Dimensionality Reduction for Speech Recognition Using Neighborhood Components Analysis

[...]

Natasha Singh-Miller¹, Michael Collins¹, Timothy J. Hazen¹•Institutions (1)

Massachusetts Institute of Technology¹

27 Aug 2007

TL;DR: This paper applies the neighborhoodcomponentsanalysis (NCA) method to acoustic modeling in a speech recognizer, learning a projection of acoustic vectors that optimizes a criterion that is closely related to the classification accuracy of a nearest-neighbor classifier.

...read moreread less

Abstract: Previous work has considered methods for learning projections of high-dimensional acoustic representations to lower dimensional spaces. In this paper we apply the neighborhoodcomponentsanalysis (NCA) [2] method to acoustic modeling in a speech recognizer. NCA learns a projection of acoustic vectors that optimizes a criterion that is closely related to the classification accuracy of a nearest-neighbor classifier. We introduce regularization into this method, giving further improvements in performance. We describe experiments on a lecture transcription task, comparing projections learned using NCA and HLDA [1] . Regularized NCA gives a 0.7% absolute reduction in WER over HLDA, which corresponds to a relative reduction of 1.9%. Index Terms: speech recognition, acoustic modeling, dimensionality reduction

...read moreread less

53 citations

Journal Article•

Treatment of restenotic drug-eluting stents: an intravascular ultrasound analysis.

[...]

Koichi Sano, Gary S. Mintz, Stephane Carlier, Emilia Solinas, Jose de Ribamar Costa, Jie Qian, Eduardo Missel, Shoujie Shan, Theresa Franklin-Bond¹, Paul Boland¹, Giora Weisz, Issam Moussa, George Dangas, Roxana Mehran, Alexandra J. Lansky, Edward M. Kreps, Michael Collins, Gregg W. Stone, Jeffrey W. Moses, Martin B. Leon - Show less +16 more•Institutions (1)

Columbia University¹

01 Nov 2007-Journal of Invasive Cardiology

TL;DR: The preintervention MSA was a major predictor of larger lumen area after repeat intervention for DES restenosis, and several IVUS studies have shown that stent dimensions do not change over time.

...read moreread less

Abstract: BACKGROUND The intravascular ultrasound (IVUS) findings during repeat intervention for drug-eluting stent (DES) restenosis have not been well described. METHODS We identified 62 consecutive DES restenosis lesions (45 sirolimus-eluting stents and 17 paclitaxel-eluting stents) undergoing repeat intervention with pre and postintervention IVUS. Lumen, stent and intimal hyperplasia (stent minus lumen) areas were measured at the minimal lumen area (MLA) site and minimal stent area (MSA) site. RESULTS Repeat stent implantation was performed in 55 lesions (88.7%). Overall, MLA increased from 2.3 +/- 0.7 mm(2) preintervention to 4.6 +/- 1.6 mm(2) postintervention. Preintervention MLA was seen at exactly the preintervention MSA site in 42%, while 73% of postintervention MLAs were located at the preintervention MSA site. There was a strong correlation between the preintervention MSA and the postintervention MLA (r = 0.79; p < 0.001). Preintervention MSA was the strongest independent predictor of a larger postintervention MLA (coefficient 0.72; p < 0.001). CONCLUSIONS The preintervention MSA was a major predictor of larger lumen area after repeat intervention for DES restenosis. Several IVUS studies have shown that stent dimensions do not change over time. Therefore, the MSA of the original stent implantation procedure still has the greatest impact on subsequent interventions to treat DES restenosis.

...read moreread less

5 citations

Trigger-basedlanguage modeling usinga loss-sensitive perceptronalgorithm

[...]

Natasha Singh-Miller, Michael Collins

01 Jan 2007

TL;DR: This paperribes amethod forincorporating discourselevel triggers into adiscriminative language model and introduces triggers that are specific toparticular unigrams and bigrams, as well as"back off' trigger features that allow generalizations tobemadeacross different unigram.

...read moreread less

Abstract: Discriminative language models using n-gram features havebeen showntobeeffective inreducing speech recognition worderror rates. Inthis paper wedescribe amethod forincorporating discourselevel triggers into adiscriminative language model.Triggers are features identifying re-occurrence ofwordswithin aconversation. We introduce triggers that arespecific toparticular unigrams and bigrams, aswellas"back off' trigger features that allow generalizations tobemadeacross different unigrams. Wetrain ourmodel using anewloss-sensitive variant oftheperceptron algorithm that makeseffective useofinformation frommultiple hypotheses inan n-best list. Wetrain andtest ontheSwitchboard data setandshowa 0.5absolute reduction inWER overabaseline discriminative model whichusesn-gram features alone, anda1.5absolute reduction in WER overthebaseline recognizer. IndexTerms-Perceptrons, Speech recognition, Natural languages

...read moreread less