Y
Yiming Yang
Researcher at Carnegie Mellon University
Publications - 311
Citations - 45622
Yiming Yang is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Computer science & Language model. The author has an hindex of 61, co-authored 241 publications receiving 36653 citations. Previous affiliations of Yiming Yang include New Mexico State University & National Institutes of Health.
Papers
More filters
Proceedings Article
A Comparative Study on Feature Selection in Text Categorization
Yiming Yang,Jan O. Pedersen +1 more
TL;DR: This paper finds strong correlations between the DF IG and CHI values of a term and suggests that DF thresholding the simplest method with the lowest cost in computation can be reliably used instead of IG or CHI when the computation of these measures are too expensive.
Proceedings Article
XLNet: Generalized Autoregressive Pretraining for Language Understanding
TL;DR: The authors proposes XLNet, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT The authors.
Posted Content
XLNet: Generalized Autoregressive Pretraining for Language Understanding
TL;DR: XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.
Proceedings ArticleDOI
A re-examination of text categorization methods
Yiming Yang,Xin Liu +1 more
TL;DR: The results show that SVM, kNN and LLSF signi cantly outperform NNet and NB when the number of positive training instances per category are small, and that all the methods perform comparably when the categories are over 300 instances.
Journal ArticleDOI
RCV1: A New Benchmark Collection for Text Categorization Research
TL;DR: This work describes the coding policy and quality control procedures used in producing the RCV1 data, the intended semantics of the hierarchical category taxonomies, and the corrections necessary to remove errorful data.