Complex Linguistic Features for Text Classification: A Comprehensive Study
read more
Citations
Machine learning
Automatically Assessing Review Helpfulness
Inductive learning algorithms and representations for text categorization
Recent Developments in Document Clustering
References
The Nature of Statistical Learning Theory
Machine learning
WordNet : an electronic lexical database
Term Weighting Approaches in Automatic Text Retrieval
Advances in kernel methods: support vector learning
Related Papers (5)
Frequently Asked Questions (15)
Q2. What future works have the authors mentioned in the paper "Advanced structural representations for question classification and answer re-ranking" ?
In the future, the authors will investigate the utility of PASs for similar tasks affected by noisy data and apply a true SVM re-ranker trained with the proposed advanced information.
Q3. What is the purpose of this paper?
In this paper, the authors extensively study advanced structural representations, namely parse trees, bag-of-words, Part-of-Speech tags and predicate argument structures for question classification and answer re-ranking.
Q4. How can the authors improve the representation of tree structures in learning machines?
Knowing that syntactic trees and PASs may improve the simple BOW representation, the authors now face the problem of representing tree structures in learning machines.
Q5. How did the authors gather the results of the answer classification experiment?
In order to gather more statistically significant data, the authors ran five-fold cross-validation, with the constraint that two pairs 〈q, a1〉 and 〈q, a2〉 associated with the same question q could not be split between training and testing.
Q6. What is the way to overcome this issue?
One way to overcome this issue is to try to capture semantic relations by processing shallow representations like predicate argument structures proposed in the PropBank3 (PB) project [9].
Q7. What is the advantage of parsing trees?
The advantage of computing sentence similarity based on parse trees with respect to purely lexical approaches is that trees provide structural relations hard to compute with other methods.
Q8. What are the phases of question processing?
The retrieval and answer extraction phases consist in retrieving relevant documents [4] and selecting candidate answer passages [5,1] from them.
Q9. What does the new kernel function do?
Their experiments with Support Vector Machines and such new functions suggest that syntactic information helps specific tasks such as question classification.
Q10. What is the reason why the results are higher than the results of [2]?
Their higher results with respect to [2] are explained by a highly performing BOW, the use of parameterization and most importantly the fact that their model is obtained by summing two separate kernel spaces (with separate normalization), as mixing BOW with tree kernels does not allow SVMs to exploit all its representational power.
Q11. How are the performance of the multi-classifier and the individual binary classifiers measured?
The performance of the multi-classifier and the individual binary classifiers are measured using accuracy and F1-measure, respectively.
Q12. How many sentences were used in the answer extraction phase?
The authors collected a corpus containing 1123 sentences, 401 of which – labeled as “+1” – answered the question either concisely or with noise; the rest – labeled as “-1”– were either irrelevant to the question or contained hints relating to the question but could not be judged as valid answers9.
Q13. What is the process of calculating the Jaccard similarity?
each sentence in each document is compared to the question to compute the Jaccard similarity, which, in the answer extraction phase, is used to select the mostrelevant sentence.
Q14. How many words can be used to improve the representation of the PB?
It contains 300,000 words annotated with predicative information on top of the Penn Treebank 2 Wall Street Journal textsThe authors can improve such representation by substituting the arguments with their most important word – often referred to as the semantic head – as in Figure 2.
Q15. What is the answer classification system?
On the other hand, the coarse-grained semantic information contained by the PAS gives promising results in answer classification, which suffers more from data sparseness.