The Use of Unlabeled Data in Predictive Modeling

doi:10.1214/088342307000000032

Open AccessJournal ArticleDOI

The Use of Unlabeled Data in Predictive Modeling

Feng Liang, +2 more

- 25 Oct 2007 -

arXiv: Methodology

TLDR

The fundamental statistical foundations for predictive modeling and the general questions associated with unlabeled data are overviewed, highlighting the relevance of venerable concepts of sampling design and prior specification.

Abstract:

The incorporation of unlabeled data in regression and classification analysis is an increasing focus of the applied statistics and machine learning literatures, with a number of recent examples demonstrating the potential for unlabeled data to contribute to improved predictive accuracy. The statistical basis for this semisupervised analysis does not appear to have been well delineated; as a result, the underlying theory and rationale may be underappreciated, especially by nonstatisticians. There is also room for statisticians to become more fully engaged in the vigorous research in this important area of intersection of the statistical and computer sciences. Much of the theoretical work in the literature has focused, for example, on geometric and structural properties of the unlabeled data in the context of particular algorithms, rather than probabilistic and statistical questions. This paper overviews the fundamental statistical foundations for predictive modeling and the general questions associated with unlabeled data, highlighting the relevance of venerable concepts of sampling design and prior specification. This theory, illustrated with a series of central illustrative examples and two substantial real data analyses, shows precisely when, why and how unlabeled data matter.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Covariance-regularized regression and classification for high-dimensional problems

Daniela Witten, +1 more

- 20 Feb 2009 -

Journal of The Royal Statistical Society...

TL;DR: It is shown that ridge regression, the lasso and the elastic net are special cases of covariance‐regularized regression, and it is demonstrated that certain previously unexplored forms of covariant regularized regression can outperform existing methods in a range of situations.

...read moreread less

Journal ArticleDOI

Penalized model-based clustering with unconstrained covariance matrices.

Hui Zhou, +2 more

- 01 Jan 2009 -

Electronic Journal of Statistics

TL;DR: This article proposes a regularized Gaussian mixture model permitting a treatment of general covariance matrices, taking various dependencies into account, and derives an E-M algorithm utilizing the graphical lasso for parameter estimation, achieving better clustering and variable selection.

...read moreread less

Journal ArticleDOI

Melanoma Therapeutic Strategies that Select against Resistance by Exploiting MYC-Driven Evolutionary Convergence

Katherine R. Singleton, +22 more

- 05 Dec 2017 -

Cell Reports

TL;DR: It is discovered that major pathways of resistance converge to activate the transcription factor, c-MYC (MYC), and MYC-driven, BRAFi-resistant cells are hypersensitive to the inhibition of MYC synthetic lethal partners, including SRC family and c-KIT tyrosine kinases, as well as glucose, glutamine, and serine metabolic pathways.

...read moreread less

Journal ArticleDOI

Variable selection and updating in model-based discriminant analysis for high dimensional data with food authenticity applications

Thomas Brendan Murphy, +2 more

- 01 Mar 2010 -

The Annals of Applied Statistics

TL;DR: A model-based discriminant analysis method that includes variable selection that outperformed default implementations of Random Forests, AdaBoost, transductive SVMs and Bayesian Multinomial Regression by substantial margins.

...read moreread less

Journal Article

Estimation of Gradients and Coordinate Covariation in Classification

Sayan Mukherjee, +1 more

- 01 Dec 2006 -

Journal of Machine Learning Research

TL;DR: An algorithm that simultaneously estimates a classification function as well as its gradient in the supervised learning framework to find salient variables and estimate how they covary is introduced.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Statistical learning theory

Vladimir Vapnik

TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

...read moreread less

Book

Kernel Methods for Pattern Analysis

John Shawe-Taylor, +1 more

TL;DR: This book provides an easy introduction for students and researchers to the growing field of kernel-based pattern analysis, demonstrating with examples how to handcraft an algorithm or a kernel for a new specific application, and covering all the necessary conceptual and mathematical tools to do so.

...read moreread less

Proceedings ArticleDOI

Combining labeled and unlabeled data with co-training

Avrim Blum, +1 more

TL;DR: A PAC-style analysis is provided for a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views, to allow inexpensive unlabeled data to augment, a much smaller set of labeled examples.

...read moreread less

Journal ArticleDOI

A Bayesian Analysis of Some Nonparametric Problems

Thomas S. Ferguson

- 01 Mar 1973 -

Annals of Statistics

TL;DR: In this article, a class of prior distributions, called Dirichlet process priors, is proposed for nonparametric problems, for which treatment of many non-parametric statistical problems may be carried out, yielding results that are comparable to the classical theory.

...read moreread less