Semi-supervised sequence tagging with bidirectional language models
Citations
29,480 citations
Cites background or methods from "Semi-supervised sequence tagging wi..."
...ELMo (Peters et al., 2017) generalizes traditional word embedding research along a different dimension....
[...]
...Language model pre-training has shown to be effective for improving many natural language processing tasks (Dai and Le, 2015; Peters et al., 2017, 2018; Radford et al., 2018; Howard and Ruder, 2018)....
[...]
24,672 citations
7,412 citations
2,466 citations
2,128 citations
References
111,197 citations
72,897 citations
"Semi-supervised sequence tagging wi..." refers background in this paper
..., 2014) or Long Short-Term Memory units (LSTM) (Hochreiter and Schmidhuber, 1997) depending on the task....
[...]
30,558 citations
"Semi-supervised sequence tagging wi..." refers background in this paper
...Many prior studies have shown that they capture useful semantic and syntactic information (Mikolov et al., 2013; Pennington et al., 2014) and including them in NLP systems has been shown to be enormously helpful for a variety of downstream tasks (Collobert et al....
[...]
...Many prior studies have shown that they capture useful semantic and syntactic information (Mikolov et al., 2013; Pennington et al., 2014) and including them in NLP systems has been shown to be enormously helpful for a variety of downstream tasks (Collobert et al., 2011)....
[...]
24,012 citations
13,190 citations
"Semi-supervised sequence tagging wi..." refers background or methods in this paper
...Accordingly, we add another layer with parameters for each label bigram, computing the sentence conditional random field (CRF) loss (Lafferty et al., 2001) using the forward-backward algorithm at training time, and using the Viterbi algorithm to find the most likely tag sequence at test time, similar to Collobert et al....
[...]
...Instead of using a LM, Li and McCallum (2005) uses a probabilistic generative model to infer contextsensitive latent variables for each token, which are then used as extra features in a supervised CRF tagger (Lafferty et al., 2001)....
[...]
...Accordingly, we add another layer with parameters for each label bigram, computing the sentence conditional random field (CRF) loss (Lafferty et al., 2001) using the forward-backward algorithm at training time, and using the Viterbi algorithm to find the most likely tag sequence at test time,…...
[...]
...However, many other sequence tagging models have been proposed in the literature for this class of problems (e.g., Lafferty et al., 2001; Collins, 2002)....
[...]