scispace - formally typeset
Y

Yiming Yang

Researcher at Carnegie Mellon University

Publications -  311
Citations -  45622

Yiming Yang is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Computer science & Language model. The author has an hindex of 61, co-authored 241 publications receiving 36653 citations. Previous affiliations of Yiming Yang include New Mexico State University & National Institutes of Health.

Papers
More filters
Proceedings Article

A Comparative Study on Feature Selection in Text Categorization

TL;DR: This paper finds strong correlations between the DF IG and CHI values of a term and suggests that DF thresholding the simplest method with the lowest cost in computation can be reliably used instead of IG or CHI when the computation of these measures are too expensive.
Proceedings Article

XLNet: Generalized Autoregressive Pretraining for Language Understanding

TL;DR: The authors proposes XLNet, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT The authors.
Posted Content

XLNet: Generalized Autoregressive Pretraining for Language Understanding

TL;DR: XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.
Proceedings ArticleDOI

A re-examination of text categorization methods

TL;DR: The results show that SVM, kNN and LLSF signi cantly outperform NNet and NB when the number of positive training instances per category are small, and that all the methods perform comparably when the categories are over 300 instances.
Journal ArticleDOI

RCV1: A New Benchmark Collection for Text Categorization Research

TL;DR: This work describes the coding policy and quality control procedures used in producing the RCV1 data, the intended semantics of the hierarchical category taxonomies, and the corrections necessary to remove errorful data.