Weakly Supervised POS Tagging without Disambiguation
read more
Citations
Disambiguation Enabled Linear Discriminant Analysis for Partial Label Dimensionality Reduction
Partial Label Dimensionality Reduction via Confidence-Based Dependence Maximization
Semi-Supervised Partial Label Learning via Confidence-Rated Margin Maximization
A data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words from product reviews
Partial label learning based on label distributions and error-correcting output codes
References
LIBSVM: A library for support vector machines
Building a large annotated corpus of English: the penn treebank
Natural Language Processing (Almost) from Scratch
Class-based n -gram models of natural language
Solving multiclass learning problems via error-correcting output codes
Related Papers (5)
Frequently Asked Questions (12)
Q2. What are the future works in "Weakly supervised part-of-speech (pos) tagging without disambiguation" ?
In the future, the authors will investigate other ways to generate the coding matrix for possible performance improvement.
Q3. What is the way to represent the context features of a target word?
To represent the context features of a target word, the authors concatenate the word embedding of the first left word, the target word and the first right word to form a 192-dimensional vector of [wi−1, wi, wi+1] and use it as the feature vector of the target word.
Q4. What is the way to train weakly-supervised POS taggers?
Although disambiguation presents as an intuitive and reasonable strategy to training weakly-supervised POS taggers, its effectiveness is largely affected by the possible errors introduced in the previous training iterations.
Q5. How did Brown et al. develop a n-gram model?
Brown et al. [1992] proposed a n-gram model based on classes of words through optimizing the probability of the corpus p(w1|c1) ∏n 2 p(wi|ci)p(ci|ci−1) using some greedy hierarchical clustering.
Q6. What is the common way to address the problem of lack of annotated data?
One common way to address the problem of lack of annotated data is to make use of a dictionary of words with each one associated with a set of possible POS tags.
Q7. What was the first method of clustering?
Based on the theory of prototypes, Abend et al. [2010] first clustered the most frequent words based on some morphological representations.
Q8. What is the simplest way to represent the context features of a word?
To represent the context features of a target word, the authors concatenate the word embedding of the first left word, the target word and first right word to form a 150-dimensional vector of [wi−1, wi, wi+1] and use it as the feature vector of the target word.
Q9. What is the way to solve the deficiency of IP?
For solving the deficiency of IP, Ravi et al. [2010] proposed a two-stage greedy minimization approach that run much faster while maintaining the performance of tagging.
Q10. Why is it difficult to compare the effectiveness of different clustering methods?
As pointed out in [Christodoulopoulos et al. 2010], due to a lack of standard and informative evaluation techniques, it is difficult to compare the effectiveness of different clustering methods.
Q11. How is the accuracy of POS tagging on words with 3 possible tags?
The authors observe that the accuracy on words with 2 possible tags is less than 90% but the accuracy on words with 3 possible tags is around 90%.
Q12. How did Kairit and others develop their approach for generating POS classes?
Kairit et al. [2014] presented an approach for inducing POS classes by combining morphological and distributional information in non-parametric Bayesian generative model based on distance-dependent Chinese restaurant process.