scispace - formally typeset
Search or ask a question

Showing papers by "Michael Collins published in 2007"


Journal ArticleDOI
TL;DR: A discriminative latent variable model for classification problems in structured domains where inputs can be represented by a graph of local observations and a hidden-state conditional random field framework learns a set of latent variables conditioned on local features.
Abstract: We present a discriminative latent variable model for classification problems in structured domains where inputs can be represented by a graph of local observations. A hidden-state conditional random field framework learns a set of latent variables conditioned on local features. Observations need not be independent and may overlap in space and time.

578 citations


Proceedings Article
01 Jun 2007
TL;DR: A key idea is to introduce non-standard CCG combinators that relax certain parts of the grammar—for example allowing flexible word order, or insertion of lexical items— with learned costs.
Abstract: We consider the problem of learning to parse sentences to lambda-calculus representations of their underlying semantics and present an algorithm that learns a weighted combinatory categorial grammar (CCG). A key idea is to introduce non-standard CCG combinators that relax certain parts of the grammar—for example allowing flexible word order, or insertion of lexical items— with learned costs. We also present a new, online algorithm for inducing a weighted CCG. Results for the approach on ATIS data show 86% F-measure in recovering fully correct semantic analyses and 95.9% F-measure by a partial-match criterion, a more than 5% improvement over the 90.3% partial-match figure reported by He and Young (2006).

490 citations


Proceedings Article
01 Jun 2007
TL;DR: A set of syntactic reordering rules that exploit systematic differences between Chinese and English word order are described, which are used as a preprocessor for both training and test sentences, transforming Chinese sentences to be much closer to English in terms of their word order.
Abstract: Syntactic reordering approaches are an effective method for handling word-order differences between source and target languages in statistical machine translation (SMT) systems. This paper introduces a reordering approach for translation from Chinese to English. We describe a set of syntactic reordering rules that exploit systematic differences between Chinese and English word order. The resulting system is used as a preprocessor for both training and test sentences, transforming Chinese sentences to be much closer to English in terms of their word order. We evaluated the reordering approach within the MOSES phrase-based SMT system (Koehn et al., 2007). The reordering approach improved the BLEU score for the MOSES system from 28.52 to 30.86 on the NIST 2006 evaluation data. We also conducted a series of experiments to analyze the accuracy and impact of different types of reordering rules.

240 citations


Journal ArticleDOI
TL;DR: This paper describes a method based on regularized likelihood that makes use of the feature set given by the perceptron algorithm, and initialization with the perceptRON's weights; this method gives an additional 0.5% reduction in word error rate (WER) over training withThe perceptron alone.

215 citations


Proceedings Article
01 Jun 2007
TL;DR: It is shown how partition functions and marginals for directed spanning trees can be computed by an adaptation of Kirchhoff’s Matrix-Tree Theorem by using the algorithm in training both log-linear and max-margin dependency parsers.
Abstract: This paper provides an algorithmic framework for learning statistical models involving directed spanning trees, or equivalently non-projective dependency structures We show how partition functions and marginals for directed spanning trees can be computed by an adaptation of Kirchhoff’s Matrix-Tree Theorem To demonstrate an application of the method, we perform experiments which use the algorithm in training both log-linear and max-margin dependency parsers The new training methods give improvements in accuracy over perceptron-trained models

164 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: This paper describes a method for learning representations from large quantities of unlabeled images which have associated captions, which significantly outperforms a fully-supervised baseline model and a model that ignores the captions and learns a visual representation by performing PCA on the unlabeling images alone.
Abstract: Current methods for learning visual categories work well when a large amount of labeled data is available, but can run into severe difficulties when the number of labeled examples is small. When labeled data is scarce it may be beneficial to use unlabeled data to learn an image representation that is low-dimensional, but nevertheless captures the information required to discriminate between image categories. This paper describes a method for learning representations from large quantities of unlabeled images which have associated captions; the goal is to improve learning in future image classification problems. Experiments show that our method significantly outperforms (1) a fully-supervised baseline model, (2) a model that ignores the captions and learns a visual representation by performing PCA on the unlabeled images alone and (3) a model that uses the output of word classifiers trained using captions and unlabeled data. Our current work concentrates on captions as the source of meta-data, but more generally other types of meta-data could be used.

111 citations


Proceedings ArticleDOI
20 Jun 2007
TL;DR: This paper describes an exponentiated gradient (EG) algorithm for trainingitional log-linear models, which results in both sequential and parallel update algorithms, and provides a convergence proof for both algorithms.
Abstract: Conditional log-linear models are a commonly used method for structured prediction. Efficient learning of parameters in these models is therefore an important problem. This paper describes an exponentiated gradient (EG) algorithm for training such models. EG is applied to the convex dual of the maximum likelihood objective; this results in both sequential and parallel update algorithms, where in the sequential algorithm parameters are updated in an online fashion. We provide a convergence proof for both algorithms. Our analysis also simplifies previous results on EG for max-margin models, and leads to a tighter bound on convergence rates. Experiments on a large-scale parsing task show that the proposed algorithm converges much faster than conjugate-gradient and L-BFGS approaches both in terms of optimization objective and test error.

53 citations


Proceedings ArticleDOI
27 Aug 2007
TL;DR: This paper applies the neighborhoodcomponentsanalysis (NCA) method to acoustic modeling in a speech recognizer, learning a projection of acoustic vectors that optimizes a criterion that is closely related to the classification accuracy of a nearest-neighbor classifier.
Abstract: Previous work has considered methods for learning projections of high-dimensional acoustic representations to lower dimensional spaces. In this paper we apply the neighborhoodcomponentsanalysis (NCA) [2] method to acoustic modeling in a speech recognizer. NCA learns a projection of acoustic vectors that optimizes a criterion that is closely related to the classification accuracy of a nearest-neighbor classifier. We introduce regularization into this method, giving further improvements in performance. We describe experiments on a lecture transcription task, comparing projections learned using NCA and HLDA [1] . Regularized NCA gives a 0.7% absolute reduction in WER over HLDA, which corresponds to a relative reduction of 1.9%. Index Terms: speech recognition, acoustic modeling, dimensionality reduction

53 citations


Journal Article
TL;DR: The preintervention MSA was a major predictor of larger lumen area after repeat intervention for DES restenosis, and several IVUS studies have shown that stent dimensions do not change over time.
Abstract: BACKGROUND The intravascular ultrasound (IVUS) findings during repeat intervention for drug-eluting stent (DES) restenosis have not been well described. METHODS We identified 62 consecutive DES restenosis lesions (45 sirolimus-eluting stents and 17 paclitaxel-eluting stents) undergoing repeat intervention with pre and postintervention IVUS. Lumen, stent and intimal hyperplasia (stent minus lumen) areas were measured at the minimal lumen area (MLA) site and minimal stent area (MSA) site. RESULTS Repeat stent implantation was performed in 55 lesions (88.7%). Overall, MLA increased from 2.3 +/- 0.7 mm(2) preintervention to 4.6 +/- 1.6 mm(2) postintervention. Preintervention MLA was seen at exactly the preintervention MSA site in 42%, while 73% of postintervention MLAs were located at the preintervention MSA site. There was a strong correlation between the preintervention MSA and the postintervention MLA (r = 0.79; p < 0.001). Preintervention MSA was the strongest independent predictor of a larger postintervention MLA (coefficient 0.72; p < 0.001). CONCLUSIONS The preintervention MSA was a major predictor of larger lumen area after repeat intervention for DES restenosis. Several IVUS studies have shown that stent dimensions do not change over time. Therefore, the MSA of the original stent implantation procedure still has the greatest impact on subsequent interventions to treat DES restenosis.

5 citations


01 Jan 2007
TL;DR: This paperribes amethod forincorporating discourselevel triggers into adiscriminative language model and introduces triggers that are specific toparticular unigrams and bigrams, as well as"back off' trigger features that allow generalizations tobemadeacross different unigram.
Abstract: Discriminative language models using n-gram features havebeen showntobeeffective inreducing speech recognition worderror rates. Inthis paper wedescribe amethod forincorporating discourselevel triggers into adiscriminative language model.Triggers are features identifying re-occurrence ofwordswithin aconversation. We introduce triggers that arespecific toparticular unigrams and bigrams, aswellas"back off' trigger features that allow generalizations tobemadeacross different unigrams. Wetrain ourmodel using anewloss-sensitive variant oftheperceptron algorithm that makeseffective useofinformation frommultiple hypotheses inan n-best list. Wetrain andtest ontheSwitchboard data setandshowa 0.5absolute reduction inWER overabaseline discriminative model whichusesn-gram features alone, anda1.5absolute reduction in WER overthebaseline recognizer. IndexTerms-Perceptrons, Speech recognition, Natural languages