Word Alignment Modeling with Context Dependent Deep Neural Network
Citations
290 citations
Cites background from "Word Alignment Modeling with Contex..."
...Index Terms—Natural language processing, word embeddings, sentiment analysis, neural networks Ç...
[...]
183 citations
Cites methods from "Word Alignment Modeling with Contex..."
...Yang et al. (2013) adapt and extend the CD-DNN-HMM (Dahl et al., 2012) model to the HMM-based word alignment model....
[...]
180 citations
Additional excerpts
...exp h f e i i i ∑ ′ ( ) ( ) λ , , (8) y3 =f(W((1))[y2; x4]+b)...
[...]
153 citations
Cites methods from "Word Alignment Modeling with Contex..."
...bedding, we follow (Yang et al., 2013) to get the bilingual word embedding using the IWSLT bilingual training data....
[...]
...Yang et al. (2013) adapt and extend the CD-DNN-HMM (Dahl et al., 2012) method to HMM-based word alignment model....
[...]
...With the trained monolingual word embedding, we follow (Yang et al., 2013) to get the bilingual word embedding using the IWSLT bilingual training data....
[...]
...Using monolingual word embedding as the initialization, we fine tune them to get bilingual word embedding (Yang et al., 2013)....
[...]
...Yang et al. (2013) adapt and extend CD-DNNHMM (Dahl et al., 2012) to word alignment....
[...]
127 citations
Cites background from "Word Alignment Modeling with Contex..."
...…statistical machine translation (SMT) community has seen a strong interest in adapting and applying DNN to many tasks, such as word alignment (Yang et al., 2013), translation confidence estimation (Mikolov et al., 2010; Liu et al., 2013; Zou et al., 2013), phrase reordering prediction (Li et…...
[...]
...Recently, statistical machine translation (SMT) community has seen a strong interest in adapting and applying DNN to many tasks, such as word alignment (Yang et al., 2013), translation confidence estimation (Mikolov et al....
[...]
References
73,978 citations
42,067 citations
15,055 citations
"Word Alignment Modeling with Contex..." refers background or methods in this paper
...For pretraining, Restricted Boltzmann Machine (RBM) (Hinton et al., 2006), auto-encoding (Bengio et al., 2007) and sparse coding (Lee et al., 2007) are proposed and popularly used....
[...]
...DNN with unsupervised pre-training was firstly introduced by (Hinton et al., 2006) for MNIST digit image classification problem, in which, RBM was introduced as the layer-wise pre-trainer....
[...]
...This trending topic, usually referred under the name Deep Learning, is started by ground-breaking papers such as (Hinton et al., 2006), in which innovative training procedures of deep structures are proposed....
[...]
...For pretraining, Restricted Boltzmann Machine (RBM) (Hinton et al., 2006), auto-encoding (Bengio et al....
[...]
7,767 citations
"Word Alignment Modeling with Contex..." refers background in this paper
...training trains the network one layer at a time, and helps to guide the parameters of the layer towards better regions in parameter space (Bengio, 2009)....
[...]
6,734 citations
"Word Alignment Modeling with Contex..." refers background or methods in this paper
...We replicate the work in (Collobert et al., 2011) and train word embeddings for source and target languages from their monolingual corpus respectively....
[...]
...(Collobert et al., 2011) and (Socher et al., 2011) further apply Recursive Neural Networks to address the structural prediction tasks such as tagging and parsing, and (Socher et al., 2012) explores the compositional aspect of word representations....
[...]
...(Collobert et al., 2011) applied DNN on several NLP tasks, such...
[...]
...(Collobert et al., 2011) applied DNN on several NLP tasks, such as part-of-speech tagging, chunking, name entity recognition, semantic labeling and syntactic parsing, where they got similar or even better results than the state-of-the-art on these tasks....
[...]
...Following (Collobert et al., 2011), we choose “hard” hyperbolic function as our activation function in this work: htanh(x) = 1 if x is greater than 1 −1 if x is less than -1 x otherwise (2) If probabilistic interpretation is desired, a softmax layer (Bridle, 1990) can be used to do…...
[...]