Deep Learning for Chinese Word Segmentation and POS Tagging
Citations
2,466 citations
1,170 citations
Cites background from "Deep Learning for Chinese Word Segm..."
...Recent work has shown that large improvements in terms of model accuracy can be obtained by performing unsupervised pre-training of word embeddings (Collobert et al., 2011; Luong et al., 2013; Zheng et al., 2013; Socher et al., 2013a)....
[...]
1,157 citations
Cites background from "Deep Learning for Chinese Word Segm..."
...It is meaningful for some tasks such as pos-tagging (Zheng et al., 2013) as the two words have similar usages and grammatical roles, but it becomes a disaster for sentiment analysis as they have the opposite sentiment polarity....
[...]
997 citations
627 citations
Cites background from "Deep Learning for Chinese Word Segm..."
...Recent work has showed that large improvements in terms of model accuracy can be obtained by performing unsupervised pre-training of word embeddings (Collobert et al., 2011; Luong et al., 2013; Zheng et al., 2013; Socher et al., 2013)....
[...]
References
13,190 citations
"Deep Learning for Chinese Word Segm..." refers methods in this paper
...In fact, a CRF maximizes the same log-likelihood (Lafferty et al., 2001) by using a linear model in stead of a nonlinear neural network....
[...]
6,734 citations
"Deep Learning for Chinese Word Segm..." refers background or methods or result in this paper
...In order to make learning algorithms less dependent on the feature engineering, we chose to use a variant of the neural network architecture first proposed by (Bengio et al., 2003) for probabilistic language model, and reintroduced later by (Collobert et al., 2011) for multiple NLP tasks....
[...]
..., 2003) for probabilistic language model, and reintroduced later by (Collobert et al., 2011) for multiple NLP tasks....
[...]
...Taking the log, the conditional probability of the true path t is given by4: log p(t|c, θ) = s(c, t, θ)− log ∑ ∀t′ exp{s(c, t′, θ)} (10) 3We did not use the stochastic gradient ascent algorithm (Bottou, 1991) to train the network as (Collobert et al., 2011)....
[...]
...We did not use the stochastic gradient ascent algorithm (Bottou, 1991) to train the network as (Collobert et al., 2011)....
[...]
...Following (Bengio et al., 2003; Collobert et al., 2011), we want semantically and syntactically similar characters to be close in the embedding space....
[...]
2,221 citations
"Deep Learning for Chinese Word Segm..." refers background or methods in this paper
...As an alternative to maximum-likelihood method, we propose the following training algorithm inspired by the work of (Collins, 2002)....
[...]
...Intuitively it can be achieved by combining the theorems of convergence for the perceptron applied to tagging problem from (Collins, 2002) with the convergence results of backpropagation algorithm from (Rumelhart et al....
[...]
...Note that the perceptron algorithm of (Collins, 2002) was designed for discriminatively training an...
[...]
1,409 citations
"Deep Learning for Chinese Word Segm..." refers background in this paper
...Several works have investigated how to use deep learning for NLP applications (Bengio et al., 2003; Collobert et al., 2011; Collobert, 2011; Socher et al., 2011)....
[...]