scispace - formally typeset
Search or ask a question

Showing papers by "Robert Tibshirani published in 1998"


Journal ArticleDOI
TL;DR: In this article, the authors discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then coupling the estimates together, similar to the Bradley-Terry method for paired comparisons.
Abstract: We discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then coupling the estimates together. The coupling model is similar to the Bradley-Terry method for paired comparisons. We study the nature of the class probability estimates that arise, and examine the performance of the procedure in real and simulated data sets. Classifiers used include linear discriminants, nearest neighbors, adaptive nonlinear methods and the support vector machine.

1,569 citations


Journal ArticleDOI
TL;DR: The paper studies the construction of confidence values and examines to what extent they approximate frequentist p-values and Bayesian a posteriori probabilities, and derives more accurate confidence levels using both frequentist and objective Bayesian approaches.
Abstract: In the problem of regions, we wish to know which one of a discrete set of possibilities applies to a continuous parameter vector. This problem arises in the following way: we compute a descriptive statistic from a set of data, notice an interesting feature and wish to assign a confidence level to that feature. For example, we compute a density estimate and notice that the estimate is bimodal. What confidence can we assign to bimodality? A natural way to measure confidence is via the bootstrap: we compute our descriptive statistic on a large number of bootstrap data sets and record the proportion of times that the feature appears. This seems like a plausible measure of confidence for the feature. The paper studies the construction of such confidence values and examines to what extent they approximate frequentist $p$-values and Bayesian a posteriori probabilities. We derive more accurate confidence levels using both frequentist and objective Bayesian approaches. The methods are illustrated with a number of examples, including polynomial model selection and estimating the number of modes of a density.

139 citations


Journal ArticleDOI
TL;DR: A battery of modern, adaptive non-linear learning methods are applied to a large real database of cardiac patient data and it is found that none of the methods could outperform a relatively simple logistic regression model previously developed for this problem.
Abstract: We apply a battery of modern, adaptive non-linear learning methods to a large real database of cardiac patient data. We use each method to predict 30 day mortality from a large number of potential risk factors, and we compare their performances. We find that none of the methods could outperform a relatively simple logistic regression model previously developed for this problem.

88 citations


Journal ArticleDOI
TL;DR: In this article, a generalized estimating equations approach for longitudinal data is proposed to incorporate the flexibility of nonparametric smoothing, and the convergence of the estimating equations and consistency of the resulting solutions are discussed.
Abstract: We introduce a class of models for longitudinal data by extending the generalized estimating equations approach of Liang and Zeger (1986) to incorporate the flexibility of nonparametric smoothing. The algorithm provides a unified estimation procedure for marginal distributions from the exponential family. We propose pointwise standard-error bands and approximate likelihood-ratio and score tests for inference. The algorithm is formally derived by using the penalized quasilikelihood framework. Convergence of the estimating equations and consistency of the resulting solutions are discussed. We illustrate the algorithm with data on the population dynamics of Colorado potato beetles on potato plants. Nous introduisons une classe de modeles pour les donnees longitudinales en etendant l'approche des equations d'estimation generalisees de Liang et Zeger (1986) afin d'incorporer la flexibility du lissage non-parametrique. L'algorithme fournit une procedure d'estimation unifiee pour les distributions marginales de la famille exponentielle. Nous proposons des bandes d'erreur standard par point et des tests de taux de vraisemblance approximative et de score pour l'inference. L'algorithme est formellement derive en utilisant le cadre de reference de la quasi-vraisemblance penalisee. La convergence des equations d'estimation et des solutions resultantes est discutee. Nous illustrons l'algorithme a l'aide de donnees concernant la dynamique des populations d'insectes des pommes de terre du Colorado sur les plants de pommes de terre.

42 citations


Journal ArticleDOI
TL;DR: A new method for regression trees which obtains estimates and predictions subject to constraints on the coefficients representing the effects of splits in the tree and for some problems gives better predictions than cost-complexity pruning used in the classification and regression tree (CART) algorithm.
Abstract: We investigate a new method for regression trees which obtains estimates and predictions subject to constraints on the coefficients representing the effects of splits in the tree. The procedure leads to both shrinking of the node estimates and pruning of branches in the tree and for some problems gives better predictions than cost-complexity pruning used in the classification and regression tree (CART) algorithm. The new method is based on the least absolute shrinkage and selection operator (LASSO) method developed by Tibshirani.

24 citations


Journal ArticleDOI
TL;DR: This work considers two methods of making use of the coaching variables in order to improve the prediction of Y from x1,x2,..., xp.
Abstract: In a regression or classification setting where we wish to predict Y from x1,x2,…, xp, we suppose that an additional set of ’coaching‘ variables z1,z2,…, zm are available in our training sample. These might be variables that are difficult to measure, and they will not be available when we predict Y from x1,x2,…, xp in the future. We consider two methods of making use of the coaching variables in order to improve the prediction of Y from x1,x2,…, xp. The relative merits of these approaches are discussed and compared in a number of examples.

20 citations