scispace - formally typeset
Search or ask a question

Showing papers by "Taro Watanabe published in 2012"


Proceedings Article
12 Jul 2012
TL;DR: A novel method for lexicon extraction that extracts translation pairs from comparable corpora by using graph-based label propagation and achieves improved performance by clustering synonyms into the same translation.
Abstract: This paper proposes a novel method for lexicon extraction that extracts translation pairs from comparable corpora by using graph-based label propagation. In previous work, it was established that performance drastically decreases when the coverage of a seed lexicon is small. We resolve this problem by utilizing indirect relations with the bilingual seeds together with direct relations, in which each word is represented by a distribution of translated seeds. The seed distributions are propagated over a graph representing relations among words, and translation pairs are extracted by identifying word pairs with a high similarity in the seed distributions. We propose two types of the graphs: a co-occurrence graph, representing co-occurrence relations between words, and a similarity graph, representing context similarities between words. Evaluations using English and Japanese patent comparable corpora show that our proposed graph propagation method outperforms conventional methods. Further, the similarity graph achieved improved performance by clustering synonyms into the same translation.

73 citations


Proceedings Article
12 Jul 2012
TL;DR: This paper proposes a method for learning a discriminative parser for machine translation reordering using only aligned parallel text by treating the parser's derivation tree as a latent variable in a model that is trained to maximize reordering accuracy.
Abstract: This paper proposes a method for learning a discriminative parser for machine translation reordering using only aligned parallel text. This is done by treating the parser's derivation tree as a latent variable in a model that is trained to maximize reordering accuracy. We demonstrate that efficient large-margin training is possible by showing that two measures of reordering accuracy can be factored over the parse tree. Using this model in the pre-ordering framework results in significant gains in translation accuracy over standard phrase-based SMT and previously proposed unsupervised syntax induction methods.

73 citations


Proceedings Article
08 Jul 2012
TL;DR: This paper demonstrates that accurate machine translation is possible without the concept of "words," treating MT as a problem of transformation between character strings, and proposes a look-ahead parsing algorithm and substring-informed prior probabilities to achieve more effective and efficient alignment.
Abstract: In this paper, we demonstrate that accurate machine translation is possible without the concept of "words," treating MT as a problem of transformation between character strings. We achieve this result by applying phrasal inversion transduction grammar alignment techniques to character strings to train a character-based translation model, and using this in the phrase-based MT framework. We also propose a look-ahead parsing algorithm and substring-informed prior probabilities to achieve more effective and efficient alignment. In an evaluation, we demonstrate that character-based translation can achieve results that compare to word-based systems while effectively translating unknown and uncommon words over several language pairs.

44 citations


Proceedings Article
12 Jul 2012
TL;DR: This paper proposes a novel local training method, which significantly outperforms MERT with the maximal improvements up to 2.0 BLEU points, meanwhile its efficiency is comparable to that of the global method.
Abstract: In statistical machine translation, minimum error rate training (MERT) is a standard method for tuning a single weight with regard to a given development data. However, due to the diversity and uneven distribution of source sentences, there are two problems suffered by this method. First, its performance is highly dependent on the choice of a development set, which may lead to an unstable performance for testing. Second, translations become inconsistent at the sentence level since tuning is performed globally on a document level. In this paper, we propose a novel local training method to address these two problems. Unlike a global training method, such as MERT, in which a single weight is learned and used for all the input sentences, we perform training and testing in one step by learning a sentence-wise weight for each input sentence. We propose efficient incremental training methods to put the local training into practice. In NIST Chinese-to-English translation tasks, our local training method significantly outperforms MERT with the maximal improvements up to 2.0 BLEU points, meanwhile its efficiency is comparable to that of the global method.

29 citations


Proceedings Article
03 Jun 2012
TL;DR: This work proposes a variant of SGD with a larger batch size in which the parameter update in each iteration is further optimized by a passive-aggressive algorithm, and indicates significantly better translation results.
Abstract: We present an online learning algorithm for statistical machine translation (SMT) based on stochastic gradient descent (SGD). Under the online setting of rank learning, a corpus-wise loss has to be approximated by a batch local loss when optimizing for evaluation measures that cannot be linearly decomposed into a sentence-wise loss, such as BLEU. We propose a variant of SGD with a larger batch size in which the parameter update in each iteration is further optimized by a passive-aggressive algorithm. Learning is efficiently parallelized and line search is performed in each round when merging parameters across parallel jobs. Experiments on the NIST Chinese-to-English Open MT task indicate significantly better translation results.

24 citations


Proceedings Article
08 Jul 2012
TL;DR: This paper presents a novel top-down head-driven parsing algorithm for data-driven projective dependency analysis that handles global structures, such as clause and coordination, better than shift-reduce or other bottom-up algorithms.
Abstract: This paper presents a novel top-down head-driven parsing algorithm for data-driven projective dependency analysis. This algorithm handles global structures, such as clause and coordination, better than shift-reduce or other bottom-up algorithms. Experiments on the English Penn Treebank data and the Chinese CoNLL-06 data show that the proposed algorithm achieves comparable results with other data-driven dependency parsing algorithms.

7 citations


Journal ArticleDOI
TL;DR: This work presents a method to directly learn this phrase table from a parallel corpus of sentences that are not aligned at the word level, through the use of non-parametric Bayesian methods and inversion transduction grammars.
Abstract: The phrase table, a scored list of bilingual phrases, lies at the center of phrase-based machine translation systems. We present a method to directly learn this phrase table from a parallel corpus of sentences that are not aligned at the word level. The key contribution of this work is that while previous methods have generally only modeled phrases at one level of granularity, in the proposed method phrases of many granularities are included directly in the model. This allows for the direct learning of a phrase table that achieves competitive accuracy without the complicated multi-step process of word alignment and phrase extraction that is used in previous research. The model is achieved through the use of non-parametric Bayesian methods and inversion transduction grammars (ITGs), a variety of synchronous context-free grammars (SCFGs). Experiments on several language pairs demonstrate that the proposed model matches the accuracy of the more traditional two-step word alignment/phrase extraction approach while reducing its phrase table to a fraction of its original size.

4 citations


Proceedings Article
01 Dec 2012
TL;DR: This work proposes an alternative tuning method based on an ultraconservative update, in which the combination of an expected task loss and the distance from the parameters in the previous round are minimized with a variant of gradient descent.
Abstract: Minimum error rate training is a popular method for parameter tuning in statistical machine translation (SMT). However, the optimization objective function may change drastically at each optimization step, which may induce MERT instability. We propose an alternative tuning method based on an ultraconservative update, in which the combination of an expected task loss and the distance from the parameters in the previous round are minimized with a variant of gradient descent. Experiments on test datasets of both Chinese-to-English and Spanish-toEnglish translation show that our method can achieve improvements over MERT under the Moses system.

2 citations


Journal ArticleDOI
TL;DR: This paper proposes to reorder the arguments but not the predicates in Japanese using a dependency structure as a kind of reordering using a discriminative approach by Ranking Support Vector Machines to re-score the multiple reordered phrase translations.
Abstract: While phrase-based statistical machine translation systems prefer to translate with longer phrases, this may cause errors in a free word order language, such as Japanese, in which the order of the arguments of the predicates is not solely determined by the predicates and the arguments can be placed quite freely in the text. In this paper, we propose to reorder the arguments but not the predicates in Japanese using a dependency structure as a kind of reordering. Instead of a single deterministically given permutation, we generate multiple reordered phrases for each sentence and translate them independently. Then we apply a re-ranking method using a discriminative approach by Ranking Support Vector Machines (SVM) to re-score the multiple reordered phrase translations. In our experiment with the travel domain corpus BTEC, we gain a 1.22% BLEU score improvement when only 1-best is used for re-ranking and 4.12% BLEU score improvement when n-best is used for Japanese-English translation. key words: predicate-argument structure, reordering, paraphrasing, reranking, statistical machine translation