Word Alignment Modeling with Context Dependent Deep Neural Network

Home
/
Papers
/
Word Alignment Modeling with Context Dependent Deep Neural Network

Proceedings Article•

Word Alignment Modeling with Context Dependent Deep Neural Network

Nan Yang¹, Shujie Liu¹, Mu Li¹, Ming Zhou¹, Nenghai Yu² - Show less +1 more•Institutions (2)

Microsoft¹, University of Science and Technology of China²

01 Aug 2013-Vol. 1, pp 166-175

TL;DR: A novel bilingual word alignment approach based on DNN (Deep Neural Network) which outperforms the HMM and IBM model 4 baselines by 2 points in F-score and generates a very compact model with much fewer parameters.

read less

Abstract: In this paper, we explore a novel bilingual word alignment approach based on DNN (Deep Neural Network), which has been proven to be very effective in various machine learning tasks (Collobert et al., 2011). We describe in detail how we adapt and extend the CD-DNNHMM (Dahl et al., 2012) method introduced in speech recognition to the HMMbased word alignment model, in which bilingual word embedding is discriminatively learnt to capture lexical translation information, and surrounding words are leveraged to model context information in bilingual sentences. While being capable to model the rich bilingual correspondence, our method generates a very compact model with much fewer parameters. Experiments on a large scale EnglishChinese word alignment task show that the proposed method outperforms the HMM and IBM model 4 baselines by 2 points in F-score.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Sentiment Embeddings with Applications to Sentiment Analysis

[...]

Duyu Tang¹, Furu Wei², Bing Qin¹, Nan Yang², Ting Liu¹, Ming Zhou² - Show less +2 more•Institutions (2)

Harbin Institute of Technology¹, Microsoft²

01 Feb 2016-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work develops a number of neural networks with tailoring loss functions, and applies sentiment embeddings to word-level sentiment analysis, sentence level sentiment classification, and building sentiment lexicons, showing results that consistently outperform context-basedembeddings on several benchmark datasets of these tasks.

...read moreread less

Abstract: We propose learning sentiment-specific word embeddings dubbed sentiment embeddings in this paper. Existing word embedding learning algorithms typically only use the contexts of words but ignore the sentiment of texts. It is problematic for sentiment analysis because the words with similar contexts but opposite sentiment polarity, such as good and bad , are mapped to neighboring word vectors. We address this issue by encoding sentiment information of texts (e.g., sentences and words) together with contexts of words in sentiment embeddings. By combining context and sentiment level evidences, the nearest neighbors in sentiment embedding space are semantically similar and it favors words with the same sentiment polarity. In order to learn sentiment embeddings effectively, we develop a number of neural networks with tailoring loss functions, and collect massive texts automatically with sentiment signals like emoticons as the training data. Sentiment embeddings can be naturally used as word features for a variety of sentiment analysis tasks without feature engineering. We apply sentiment embeddings to word-level sentiment analysis, sentence level sentiment classification, and building sentiment lexicons. Experimental results show that sentiment embeddings consistently outperform context-based embeddings on several benchmark datasets of these tasks. This work provides insights on the design of neural networks for learning task-specific word embeddings in other natural language processing tasks.

...read moreread less

290 citations

Cites background from "Word Alignment Modeling with Contex..."

...Index Terms—Natural language processing, word embeddings, sentiment analysis, neural networks Ç...
[...]

Proceedings Article•DOI•

Hierarchical Recurrent Neural Network for Document Modeling

[...]

Rui Lin¹, Shujie Liu¹, Muyun Yang¹, Mu Li², Ming Zhou², Sheng Li² - Show less +2 more•Institutions (2)

Harbin Institute of Technology¹, Microsoft²

01 Sep 2015

TL;DR: A novel hierarchical recurrent neural network language model (HRNNLM) for document modeling that integrates it as the sentence history information into the word level RNN to predict the word sequence with cross-sentence contextual information.

...read moreread less

Abstract: This paper proposes a novel hierarchical recurrent neural network language model (HRNNLM) for document modeling. After establishing a RNN to capture the coherence between sentences in a document, HRNNLM integrates it as the sentence history information into the word level RNN to predict the word sequence with cross-sentence contextual information. A two-step training approach is designed, in which sentence-level and word-level language models are approximated for the convergence in a pipeline style. Examined by the standard sentence reordering scenario, HRNNLM is proved for its better accuracy in modeling the sentence coherence. And at the word level, experimental results also indicate a significant lower model perplexity, followed by a practical better translation result when applied to a Chinese-English document translation reranking task.

...read moreread less

183 citations

Cites methods from "Word Alignment Modeling with Contex..."

...Yang et al. (2013) adapt and extend the CD-DNN-HMM (Dahl et al., 2012) model to the HMM-based word alignment model....
[...]

Journal Article•DOI•

Deep Neural Networks in Machine Translation: An Overview

[...]

Jiajun Zhang¹, Chengqing Zong¹•Institutions (1)

Chinese Academy of Sciences¹

01 Sep 2015-IEEE Intelligent Systems

TL;DR: An overview of DNN applications in various aspects of MT is given, including machine translation, reinforcement learning, and more.

...read moreread less

Abstract: Deep neural networks (DNNs) are widely used in machine translation (MT). This article gives an overview of DNN applications in various aspects of MT.

...read moreread less

180 citations

Additional excerpts

...exp h f e i i i ∑ ′ ( ) ( ) λ , , (8) y3 =f(W((1))[y2; x4]+b)...
[...]

Proceedings Article•DOI•

A Recursive Recurrent Neural Network for Statistical Machine Translation

[...]

Shujie Liu¹, Nan Yang¹, Mu Li¹, Ming Zhou¹•Institutions (1)

Microsoft¹

01 Jun 2014

TL;DR: A novel recursive recurrent neural network (R 2 NN) is proposed to model the end-to-end decoding process for statistical machine translation and can outperform the state of theart baseline by about 1.5 points in BLEU.

...read moreread less

Abstract: In this paper, we propose a novel recursive recurrent neural network (R 2 NN) to model the end-to-end decoding process for statistical machine translation. R 2 NN is a combination of recursive neural network and recurrent neural network, and in turn integrates their respective capabilities: (1) new information can be used to generate the next hidden state, like recurrent neural networks, so that language model and translation model can be integrated naturally; (2) a tree structure can be built, as recursive neural networks, so as to generate the translation candidates in a bottom up manner. A semi-supervised training approach is proposed to train the parameters, and the phrase pair embedding is explored to model translation confidence directly. Experiments on a Chinese to English translation task show that our proposed R 2 NN can outperform the stateof-the-art baseline by about 1.5 points in BLEU.

...read moreread less

153 citations

Cites methods from "Word Alignment Modeling with Contex..."

...bedding, we follow (Yang et al., 2013) to get the bilingual word embedding using the IWSLT bilingual training data....
[...]
...Yang et al. (2013) adapt and extend the CD-DNN-HMM (Dahl et al., 2012) method to HMM-based word alignment model....
[...]
...With the trained monolingual word embedding, we follow (Yang et al., 2013) to get the bilingual word embedding using the IWSLT bilingual training data....
[...]
...Using monolingual word embedding as the initialization, we fine tune them to get bilingual word embedding (Yang et al., 2013)....
[...]
...Yang et al. (2013) adapt and extend CD-DNNHMM (Dahl et al., 2012) to word alignment....
[...]

Proceedings Article•DOI•

Bilingually-constrained Phrase Embeddings for Machine Translation

[...]

Jiajun Zhang¹, Shujie Liu¹, Mu Li², Ming Zhou², Chengqing Zong² - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, Microsoft²

01 Jun 2014

TL;DR: This work proposes Bilingually-constrained Recursive Auto-encoders (BRAE) to learn semantic phrase embeddings (compact vector representations for phrases), which can distinguish the phrases with different semantic meanings.

...read moreread less

Abstract: We propose Bilingually-constrained Recursive Auto-encoders (BRAE) to learn semantic phrase embeddings (compact vector representations for phrases), which can distinguish the phrases with different semantic meanings. The BRAE is trained in a way that minimizes the semantic distance of translation equivalents and maximizes the semantic distance of nontranslation pairs simultaneously. After training, the model learns how to embed each phrase semantically in two languages and also learns how to transform semantic embedding space in one language to the other. We evaluate our proposed method on two end-to-end SMT tasks (phrase table pruning and decoding with phrasal semantic similarities) which need to measure semantic similarity between a source phrase and its translation candidates. Extensive experiments show that the BRAE is remarkably effective in these two tasks.

...read moreread less

127 citations

Cites background from "Word Alignment Modeling with Contex..."

...…statistical machine translation (SMT) community has seen a strong interest in adapting and applying DNN to many tasks, such as word alignment (Yang et al., 2013), translation confidence estimation (Mikolov et al., 2010; Liu et al., 2013; Zou et al., 2013), phrase reordering prediction (Li et…...
[...]
...Recently, statistical machine translation (SMT) community has seen a strong interest in adapting and applying DNN to many tasks, such as word alignment (Yang et al., 2013), translation confidence estimation (Mikolov et al....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18

Collapse

References

PDF

Open Access

More filters

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

[...]

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

73,978 citations

Journal Article•DOI•

Gradient-based learning applied to document recognition

[...]

Yann LeCun¹, Léon Bottou², Léon Bottou³, Yoshua Bengio⁴, Yoshua Bengio⁵, Yoshua Bengio², Patrick Haffner² - Show less +3 more•Institutions (5)

Bell Labs¹, AT&T², École Normale Supérieure³, Alcatel-Lucent⁴, École Polytechnique de Montréal⁵

01 Jan 1998

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

...read moreread less

42,067 citations

Journal Article•DOI•

A fast learning algorithm for deep belief nets

[...]

Geoffrey E. Hinton¹, Simon Osindero¹, Yee Whye Teh²•Institutions (2)

University of Toronto¹, National University of Singapore²

01 Jul 2006-Neural Computation

TL;DR: A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.

...read moreread less

Abstract: We show how to use "complementary priors" to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.

...read moreread less

15,055 citations

"Word Alignment Modeling with Contex..." refers background or methods in this paper

...For pretraining, Restricted Boltzmann Machine (RBM) (Hinton et al., 2006), auto-encoding (Bengio et al., 2007) and sparse coding (Lee et al., 2007) are proposed and popularly used....
[...]
...DNN with unsupervised pre-training was firstly introduced by (Hinton et al., 2006) for MNIST digit image classification problem, in which, RBM was introduced as the layer-wise pre-trainer....
[...]
...This trending topic, usually referred under the name Deep Learning, is started by ground-breaking papers such as (Hinton et al., 2006), in which innovative training procedures of deep structures are proposed....
[...]
...For pretraining, Restricted Boltzmann Machine (RBM) (Hinton et al., 2006), auto-encoding (Bengio et al....
[...]

Book•

Learning Deep Architectures for AI

[...]

Yoshua Bengio¹•Institutions (1)

Université de Montréal¹

01 Jan 2009

TL;DR: The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.

...read moreread less

Abstract: Can machine learning deliver AI? Theoretical results, inspiration from the brain and cognition, as well as machine learning experiments suggest that in order to learn the kind of complicated functions that can represent high-level abstractions (e.g. in vision, language, and other AI-level tasks), one would need deep architectures. Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers, graphical models with many levels of latent variables, or in complicated propositional formulae re-using many sub-formulae. Each level of the architecture represents features at a different level of abstraction, defined as a composition of lower-level features. Searching the parameter space of deep architectures is a difficult task, but new algorithms have been discovered and a new sub-area has emerged in the machine learning community since 2006, following these discoveries. Learning algorithms such as those for Deep Belief Networks and other related unsupervised learning algorithms have recently been proposed to train deep architectures, yielding exciting results and beating the state-of-the-art in certain areas. Learning Deep Architectures for AI discusses the motivations for and principles of learning algorithms for deep architectures. By analyzing and comparing recent results with different learning algorithms for deep architectures, explanations for their success are proposed and discussed, highlighting challenges and suggesting avenues for future explorations in this area.

...read moreread less

7,767 citations

"Word Alignment Modeling with Contex..." refers background in this paper

...training trains the network one layer at a time, and helps to guide the parameters of the layer towards better regions in parameter space (Bengio, 2009)....
[...]

Journal Article•

Natural Language Processing (Almost) from Scratch

[...]

Ronan Collobert, Jason Weston¹, Léon Bottou, Michael Karlen, Koray Kavukcuoglu², Pavel P. Kuksa³ - Show less +2 more•Institutions (3)

Google¹, New York University², Rutgers University³

01 Feb 2011-Journal of Machine Learning Research

TL;DR: A unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recognition, and semantic role labeling is proposed.

...read moreread less

Abstract: We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid task-specific engineering and therefore disregarding a lot of prior knowledge. Instead of exploiting man-made input features carefully optimized for each task, our system learns internal representations on the basis of vast amounts of mostly unlabeled training data. This work is then used as a basis for building a freely available tagging system with good performance and minimal computational requirements.

...read moreread less

6,734 citations

"Word Alignment Modeling with Contex..." refers background or methods in this paper

...We replicate the work in (Collobert et al., 2011) and train word embeddings for source and target languages from their monolingual corpus respectively....
[...]
...(Collobert et al., 2011) and (Socher et al., 2011) further apply Recursive Neural Networks to address the structural prediction tasks such as tagging and parsing, and (Socher et al., 2012) explores the compositional aspect of word representations....
[...]
...(Collobert et al., 2011) applied DNN on several NLP tasks, such...
[...]
...(Collobert et al., 2011) applied DNN on several NLP tasks, such as part-of-speech tagging, chunking, name entity recognition, semantic labeling and syntactic parsing, where they got similar or even better results than the state-of-the-art on these tasks....
[...]
...Following (Collobert et al., 2011), we choose “hard” hyperbolic function as our activation function in this work: htanh(x) = 1 if x is greater than 1 −1 if x is less than -1 x otherwise (2) If probabilistic interpretation is desired, a softmax layer (Bridle, 1990) can be used to do…...
[...]