scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Statistics Theory in 2004"


Journal ArticleDOI
TL;DR: In this article, the authors re-joinder to ''Least angle regression'' by Efron et al. [math.ST/0406456] is presented.
Abstract: Rejoinder to ``Least angle regression'' by Efron et al. [math.ST/0406456]

1,237 citations


Journal ArticleDOI
TL;DR: Least Angle Regression (LARS) as discussed by the authors is a new model selection algorithm, which is a useful and less greedy version of traditional forward selection methods such as All Subsets, Forward Selection and Backward Elimination.
Abstract: The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method;

547 citations


Posted Content
TL;DR: In this article, the authors proposed an estimator, the Multi-scale Realized Volatility (MSRV), which converges to the true volatility at the rate of n^{-1/4}, which is the best attainable.
Abstract: With the availability of high frequency financial data, nonparametric estimation of volatility of an asset return process becomes feasible. A major problem is how to estimate the volatility consistently and efficiently, when the observed asset returns contain error or noise, for example, in the form of microstructure noise. The former (consistency) has been addressed heavily in the recent literature, however, the resulting estimator is not quite efficient. In Zhang, Mykland and Ait-Sahalia (2003), the best estimator converges to the true volatility only at the rate of n^{-1/6}. In this paper, we propose an estimator, the {\it Multi-scale Realized Volatility (MSRV)}, which converges to the true volatility at the rate of n^{-1/4}, which is the best attainable. We have shown a central limit theorem for the MSRV estimator, which permits setting intervals for the true integrated volatility on the basis of MSRV.

459 citations


Journal ArticleDOI
TL;DR: Turlach et al. as mentioned in this paper proposed a new variables election method (LARS) for building linear models and show how their new method relates to other methods that have been proposed recently.
Abstract: DISCUSSION OF “LEAST ANGLE REGRESSION” BY EFRONET AL.By Berwin A. TurlachUniversity of Western AustraliaI would like to begin by congratulating the authors (referred to belowas EHJT) for their interesting paper in which they propose a new variableselection method (LARS) for building linear models and show how their newmethod relates to other methods that have been proposed recently. I foundthe paper to be very stimulating and found the additional insight that itprovides about the Lasso technique to be of particular interest.My comments center around the question of how we can select linearmodels that conform with the marginality principle [Nelder (1977, 1994)and McCullagh and Nelder (1989)]; that is, the response surface is invariantunder scaling and translation of the explanatory variables in the model.Recently one of my interests was to explore whether the Lasso techniqueor the nonnegative garrote [Breiman (1995)] could be modified such that itincorporates the marginality principle. However, it does not seem to be atrivial matter to change the criteria that these techniques minimize in such away that the marginality principle is incorporated in a satisfactory manner.On the other hand, it seems to be straightforward to modify the LARStechnique to incorporate this principle. In their paper, EHJT address thisissue somewhat in passing when they suggest toward the end of Section 3that one first fit main effects only and interactions in a second step to controlthe order in which variables are allowed to enter the model. However, sucha two-step procedure may have a somewhat less than optimal behavior asthe following, admittedly artificial, example shows.Assume we have a vector of explanatory variables X =(X

377 citations


Journal ArticleDOI
TL;DR: The authors present a simple C p statistic for LARS, an efficient, simple algorithm for the Lasso as well as algorithms for stagewise regression and the new least angle regression and interesting connections exist between boosting and stage-wise algorithms so predictive comparisons with boosting are also of interest.
Abstract: Algorithms for simultaneous shrinkage and selection in regression and classification provide attractive solutions to knotty old statistical challenges. Nevertheless, as far as we can tell, Tibshirani's Lasso algorithm has had little impact on statistical practice. Two particular reasons for this may be the relative inefficiency of the original Lasso algorithm and the relative complexity of more recent Lasso algorithms [e.g., Osborne, Presnell and Turlach (2000)]. Efron, Hastie, Johnstone and Tibshirani have provided an efficient, simple algorithm for the Lasso as well as algorithms for stagewise regression and the new least angle regression. As such this paper is an important contribution to statistical computing. 1. Predictive performance. The authors say little about predictive performance issues. In our work, however, the relative out-of-sample predictive performance of LARS, Lasso and Forward Stagewise (and variants thereof) takes center stage. Interesting connections exist between boosting and stage-wise algorithms so predictive comparisons with boosting are also of interest. The authors present a simple C p statistic for LARS. In practice, a cross-validation (CV) type approach for selecting the degree of shrinkage, while computationally more expensive, may lead to better predictions. We considered this using the LARS software. Here we report results for the authors' diabetes data, the Boston housing data and the Servo data from the UCI Machine Learning Repository. Specifically, we held out 10% of the data and chose the shrinkage level using either C p or nine-fold CV using 90% of the data. Then we estimated mean square error (MSE) on the 10% hold-out sample. Table 1 shows the results for main-effects models. Table 1 exhibits two particular characteristics. First, as expected, Stage-wise, LARS and Lasso perform similarly. Second, C p performs as well as cross-validation; if this holds up more generally, larger-scale applications will want to use C p to select the degree of shrinkage.

360 citations


Posted Content
TL;DR: This tutorial provides an overview of and introduction to Rissanen's Minimum Description Length (MDL) Principle and serves as a basis for the technical introduction given in the second chapter, in which all the ideas are made mathematically precise.
Abstract: This tutorial provides an overview of and introduction to Rissanen's Minimum Description Length (MDL) Principle. The first chapter provides a conceptual, entirely non-technical introduction to the subject. It serves as a basis for the technical introduction given in the second chapter, in which all the ideas of the first chapter are made mathematically precise. The main ideas are discussed in great conceptual and technical detail. This tutorial is an extended version of the first two chapters of the collection "Advances in Minimum Description Length: Theory and Application" (edited by P.Grunwald, I.J. Myung and M. Pitt, to be published by the MIT Press, Spring 2005).

353 citations


Journal ArticleDOI
TL;DR: The LAR–Lasso–boosting relationship opens the door for new insights on existing methods' underlying statistical mechanisms and for the development of new and promising methodology in wider statistical domains: robust fitting, classification, machine learning and more.
Abstract: 1. Introduction. We congratulate the authors on their excellent work. The paper combines elegant theory and useful practical results in an intriguing manner. The LAR–Lasso–boosting relationship opens the door for new insights on existing methods' underlying statistical mechanisms and for the development of new and promising methodology. Two issues in particular have captured our attention, as their implications go beyond the squared error loss case presented in this paper, into wider statistical domains: robust fitting, classification, machine learning and more. We concentrate our discussion on these two results and their extensions.

346 citations


Journal ArticleDOI
TL;DR: The LARS method as discussed by the authors is based on a recursive procedure selecting, at each step, the covariates having largest absolute correlation with the response variable, which enables recovering the estimates given by the Lasso and Stagewise.
Abstract: DISCUSSION OF “LEAST ANGLE REGRESSION” BY EFRONET AL.By Jean-Michel Loubes and Pascal MassartUniversit´e Paris-SudThe issue of model selection has drawn the attention of both applied andtheoretical statisticians for a long time. Indeed, there has been an enor-mous range of contribution in model selection proposals, including work byAkaike (1973), Mallows (1973), Foster and George (1994), Birg´e and Mas-sart (2001a) and Abramovich, Benjamini, Donoho and Johnstone (2000).Over the last decade, modern computer-driven methods have been devel-oped such as All Subsets, Forward Selection, Forward Stagewise or Lasso.Such methods are useful in the setting of the standard linear model, wherewe observe noisy data and wish to predict the response variable using onlya few covariates, since they provide automatically linear models that fit thedata. The procedure described in this paper is, on the one hand, numeri-cally very efficient and, on the other hand, very general, since, with slightmodifications, it enables us to recover the estimates given by the Lasso andStagewise.1. Estimation procedure. The “LARS” method is based on a recursiveprocedure selecting, at each step, the covariates having largest absolute cor-relation with the response y. In the case of an orthogonal design, the esti-mates can then be viewed as an l

341 citations


Journal ArticleDOI
TL;DR: For example, Efron, Hastie, Johnstone and Tibshirani as discussed by the authors used LARS and OLS to solve the structural dimension problem in linear regression, where the conditional distributions can be written as F(y|x) = F (y |x ) (1.1) for some unknown vector β, where B is an m × d rank d matrix.
Abstract: Most of this article concerns the uses of LARS and the two related methods in the age-old, " somewhat notorious, " problem of " [a]utomatic model-building algorithms.. . " for linear regression. In the following, I will confine my comments to this notorious problem and to the use of LARS and its relatives to solve it. 1. The implicit assumption. Suppose the response is y, and we collect the m predictors into a vector x, the realized data into an n × m matrix X and the response is the n-vector Y. If P is the projection onto the column space of (1, X), then LARS, like ordinary least squares (OLS), assumes that, for the purposes of model building, Y can be replaced byˆY = P Y without loss of information. In large samples, this is equivalent to the assumption that the conditional distributions F (y|x) can be written as F (y|x) = F (y|x ′ β) (1.1) for some unknown vector β. Efron, Hastie, Johnstone and Tibshirani use this assumption in the definition of the LARS algorithm and in estimating residual variance byˆσ 2 = (I − P)Y 2 /(n − m − 1). For LARS to be reasonable , we need to have some assurance that this particular assumption holds or that it is relatively benign. If this assumption is not benign, then LARS like OLS is unlikely to produce useful results. A more general alternative to (1.1) is F (y|x) = F (y|x ′ B), (1.2) where B is an m × d rank d matrix. The smallest value of d for which (1.2) holds is called the structural dimension of the regression problem [Cook (1998)]. An obvious precursor to fitting linear regression is deciding on the structural dimension, not proceeding as if d = 1. For the diabetes data used

335 citations


Journal ArticleDOI
TL;DR: In this article, Efron et al. discuss the least angle regression (LEAR) method and its application in least angle estimation. [math.ST/0406456]
Abstract: Discussion of ``Least angle regression'' by Efron et al. [math.ST/0406456]

334 citations


Journal ArticleDOI
TL;DR: This research attacks this question head on, introducing not only a computationally efficient algorithm and method, LARS (and its derivatives), but at the same time introducing comprehensive theory explaining the intricate details of the procedure as well as theory to guide its practical implementation.
Abstract: Being able to reliably, and automatically, select variables in linear regression models is a notoriously difficult problem. This research attacks this question head on, introducing not only a computationally efficient algorithm and method, LARS (and its derivatives), but at the same time introducing comprehensive theory explaining the intricate details of the procedure as well as theory to guide its practical implementation. This is a fascinating paper and I commend the authors for this important work. Automatic variable selection, the main theme of this paper, has many goals. So before embarking upon a discussion of the paper it is important to first sit down and clearly identify what the objectives are. The authors make it clear in their introduction that, while often the goal in variable selection is to select a " good " linear model, where goodness is measured in terms of prediction accuracy performance, it is also important at the same time to choose models which lean toward the parsimonious side. So here the goals are pretty clear: we want good prediction error performance but also simpler models. These are certainly reasonable objectives and quite justifiable in many scientific settings. At the same, however, one should recognize the difficulty of the task, as the two goals, low prediction error and smaller models, can be diametrically opposed. By this I mean that certainly from an oracle point of view it is true that minimizing prediction error will identify the true model, and thus, by going after prediction error (in a perfect world), we will also get smaller models by default. However, in practice, what happens is that small gains in prediction error often translate into larger models and less dimension reduction. So as procedures get better at reducing prediction error, they can also get worse at picking out variables accurately. Unfortunately, I have some misgivings that LARS might be falling into this trap. Mostly my concern is fueled by the fact that Mallows' C p is the criterion used for determining the optimal LARS model. The use of C p

Journal ArticleDOI
TL;DR: In this paper, Efron et al. discuss the least angle regression (LEAR) method and its application in least angle estimation. [math.ST/0406456]
Abstract: Discussion of ``Least angle regression'' by Efron et al. [math.ST/0406456]

Posted Content
TL;DR: In this paper, a brief overview of nonparametric techniques that are useful for financial econometric problems is given, including estimation and inferences of instantaneous returns and volatility functions of time-homogeneous and time-dependent diffusion processes, and estimation of transition densities and state price densities.
Abstract: This paper gives a brief overview on the nonparametric techniques that are useful for financial econometric problems. The problems include estimation and inferences of instantaneous returns and volatility functions of time-homogeneous and time-dependent diffusion processes, and estimation of transition densities and state price densities. We first briefly describe the problems and then outline main techniques and main results. Some useful probabilistic aspects of diffusion processes are also briefly summarized to facilitate our presentation and applications.

Journal ArticleDOI
TL;DR: In this article, Fan and Li showed that the nonconcave penalized likelihood has an oracle property when the number of parameters is finite, and the consistency of the covariance matrix is demonstrated.
Abstract: A class of variable selection procedures for parametric models via nonconcave penalized likelihood was proposed by Fan and Li to simultaneously estimate parameters and select important variables. They demonstrated that this class of procedures has an oracle property when the number of parameters is finite. However, in most model selection problems the number of parameters should be large and grow with the sample size. In this paper some asymptotic properties of the nonconcave penalized likelihood are established for situations in which the number of parameters tends to \infty as the sample size increases. Under regularity conditions we have established an oracle property and the asymptotic normality of the penalized likelihood estimators. Furthermore, the consistency of the sandwich formula of the covariance matrix is demonstrated. Nonconcave penalized likelihood ratio statistics are discussed, and their asymptotic distributions under the null hypothesis are obtained by imposing some mild conditions on the penalty functions.

Journal ArticleDOI
TL;DR: It is shown that, for selection among normal linear models, the optimal predictive model is often the median probability model, which is defined as the model consisting of those variables which have overall posterior probability greater than or equal to 1/2 of being in a model.
Abstract: Often the goal of model selection is to choose a model for future prediction, and it is natural to measure the accuracy of a future prediction by squared error loss. Under the Bayesian approach, it is commonly perceived that the optimal predictive model is the model with highest posterior probability, but this is not necessarily the case. In this paper we show that, for selection among normal linear models, the optimal predictive model is often the median probability model, which is defined as the model consisting of those variables which have overall posterior probability greater than or equal to 1/2 of being in a model. The median probability model often differs from the highest probability model.

Journal ArticleDOI
TL;DR: In this paper, asymptotic properties of regression parameters in linear models in which errors are dependent are studied. But the results are applied to linear models with errors being short-range dependent linear processes, heavy-tailed linear processes and some widely used nonlinear time series.
Abstract: We study asymptotic properties of $M$-estimates of regression parameters in linear models in which errors are dependent. Weak and strong Bahadur representations of the $M$-estimates are derived and a central limit theorem is established. The results are applied to linear models with errors being short-range dependent linear processes, heavy-tailed linear processes and some widely used nonlinear time series.

Journal ArticleDOI
TL;DR: In this article, a model selection technique of estimation in semiparametric regression models of the type Y_i=\beta^{\prime}\underbarX_i+f(T_i)+W_i, i=1,...,n.
Abstract: This paper presents a model selection technique of estimation in semiparametric regression models of the type Y_i=\beta^{\prime}\underbarX_i+f(T_i)+W_i, i=1,...,n. The parametric and nonparametric components are estimated simultaneously by this procedure. Estimation is based on a collection of finite-dimensional models, using a penalized least squares criterion for selection. We show that by tailoring the penalty terms developed for nonparametric regression to semiparametric models, we can consistently estimate the subset of nonzero coefficients of the linear part. Moreover, the selected estimator of the linear component is asymptotically normal.

Journal ArticleDOI
TL;DR: In this article, the authors provide theorems on convergence rates of posterior distributions that can be applied to obtain good convergence rates in the context of density estimation as well as regression.
Abstract: The goal of this paper is to provide theorems on convergence rates of posterior distributions that can be applied to obtain good convergence rates in the context of density estimation as well as regression. We show how to choose priors so that the posterior distributions converge at the optimal rate without prior knowledge of the degree of smoothness of the density function or the regression function to be estimated.

Journal ArticleDOI
TL;DR: In this paper, it is shown that the ratio of the total number of citations of any two broad fields of science remains close to constant over the analyzed years, and normalization of total numbers of citations with respect to the number of cited articles in mathematics is suggested as a tool for comparing scientific impact expressed by the number in different fields.
Abstract: Citation distributions for 1992, 1994, 1996, 1997, 1999, and 2001, which were published in the 2004 report of the National Science Foundation, USA, are analyzed. It is shown that the ratio of the total number of citations of any two broad fields of science remains close to constant over the analyzed years. Basing on this observation, normalization of total numbers of citations with respect to the number of citations in mathematics is suggested as a tool for comparing scientific impact expressed by the number of citations in different fields of science.

Posted Content
TL;DR: In this paper, a first introduction to the field of phylogenomics is given, as well as a discussion of specific mathematical problems and developments arising from phylogenomics, which is a combination of two major fields in the life sciences: genomics and molecular phylogenetics.
Abstract: The grand challenges in biology today are being shaped by powerful high-throughput technologies that have revealed the genomes of many organisms, global expression patterns of genes and detailed information about variation within populations. We are therefore able to ask, for the first time, fundamental questions about the evolution of genomes, the structure of genes and their regulation, and the connections between genotypes and phenotypes of individuals. The answers to these questions are all predicated on progress in a variety of computational, statistical, and mathematical fields. The rapid growth in the characterization of genomes has led to the advancement of a new discipline called Phylogenomics. This discipline results from the combination of two major fields in the life sciences: Genomics, i.e., the study of the function and structure of genes and genomes; and Molecular Phylogenetics, i.e., the study of the hierarchical evolutionary relationships among organisms and their genomes. The objective of this article is to offer mathematicians a first introduction to this emerging field, and to discuss specific mathematical problems and developments arising from phylogenomics.

Journal ArticleDOI
TL;DR: In this paper, the authors provide a conceptual framework and formalization for structural nested models in continuous time and show that the resulting estimators are consistent and asymptotically normal.
Abstract: This article studies the estimation of the causal effect of a time-varying treatment on time-to-an-event or on some other continuously distributed outcome. The paper applies to the situation where treatment is repeatedly adapted to time-dependent patient characteristics. The treatment effect cannot be estimated by simply conditioning on these time-dependent patient characteristics, as they may themselves be indications of the treatment effect. This time-dependent confounding is common in observational studies. Robins [(1992) Biometrika 79 321--334, (1998b) Encyclopedia of Biostatistics 6 4372--4389] has proposed the so-called structural nested models to estimate treatment effects in the presence of time-dependent confounding. In this article we provide a conceptual framework and formalization for structural nested models in continuous time. We show that the resulting estimators are consistent and asymptotically normal. Moreover, as conjectured in Robins [(1998b) Encyclopedia of Biostatistics 6 4372--4389], a test for whether treatment affects the outcome of interest can be performed without specifying a model for treatment effect. We illustrate the ideas in this article with an example.

Posted Content
TL;DR: In this article, the authors considered a spiked population model, where the population eigenvalues are all unit except for a few fixed eigen values, and determined the almost sure limits for a general class of samples.
Abstract: We consider a spiked population model, proposed by Johnstone, whose population eigenvalues are all unit except for a few fixed eigenvalues. The question is to determine how the sample eigenvalues depend on the non-unit population ones when both sample size and population size become large. This paper completely determines the almost sure limits for a general class of samples.

Posted Content
TL;DR: In this paper, the authors consider estimating the common probability density of random variables with additive i.i.d. noise and propose a kernel type estimator that is optimal in sharp asymptotically minimax sense under the pointwise and the L 2 -risks.
Abstract: We consider estimation of the common probability density $f$ of i.i.d. random variables $X_i$ that are observed with an additive i.i.d. noise. We assume that the unknown density $f$ belongs to a class $\mathcal{A}$ of densities whose characteristic function is described by the exponent $\exp(-\alpha |u|^r)$ as $|u|\to \infty$, where $\alpha >0$, $r>0$. The noise density is supposed to be known and such that its characteristic function decays as $\exp(-\beta |u|^s)$, as $|u| \to \infty$, where $\beta >0$, $s>0$. Assuming that $r

Journal ArticleDOI
TL;DR: The concept of biased data is well known and its practical applications range from social sciences and biology to economics and quality control as mentioned in this paper, but no results are available about sharp constants, which is an interesting sampling procedure because it favors some observations and neglects others.
Abstract: The concept of biased data is well known and its practical applications range from social sciences and biology to economics and quality control. These observations arise when a sampling procedure chooses an observation with probability that depends on the value of the observation. This is an interesting sampling procedure because it favors some observations and neglects others. It is known that biasing does not change rates of nonparametric density estimation, but no results are available about sharp constants. This article presents asymptotic results on sharp minimax density estimation. In particular, a coefficient of difficulty is introduced that shows the relationship between sample sizes of direct and biased samples that imply the same accuracy of estimation. The notion of the restricted local minimax, where a low-frequency part of the estimated density is known, is introduced; it sheds new light on the phenomenon of nonparametric superefficiency. Results of a numerical study are presented.

Journal ArticleDOI
TL;DR: In this article, the authors show that the resulting higher criticism statistic is effective at resolving a very subtle testing problem: testing whether n normal means are all zero versus the alternative that a small fraction is nonzero.
Abstract: Higher criticism, or second-level significance testing, is a multiple-comparisons concept mentioned in passing by Tukey. It concerns a situation where there are many independent tests of significance and one is interested in rejecting the joint null hypothesis. Tukey suggested comparing the fraction of observed significances at a given \alpha-level to the expected fraction under the joint null. In fact, he suggested standardizing the difference of the two quantities and forming a z-score; the resulting z-score tests the significance of the body of significance tests. We consider a generalization, where we maximize this z-score over a range of significance levels 0<\alpha\leq\alpha_0. We are able to show that the resulting higher criticism statistic is effective at resolving a very subtle testing problem: testing whether n normal means are all zero versus the alternative that a small fraction is nonzero. The subtlety of this ``sparse normal means'' testing problem can be seen from work of Ingster and Jin, who studied such problems in great detail. In their studies, they identified an interesting range of cases where the small fraction of nonzero means is so small that the alternative hypothesis exhibits little noticeable effect on the distribution of the p-values either for the bulk of the tests or for the few most highly significant tests. In this range, when the amplitude of nonzero means is calibrated with the fraction of nonzero means, the likelihood ratio test for a precisely specified alternative would still succeed in separating the two hypotheses.

Journal ArticleDOI
TL;DR: In this paper, the minimax theory for estimating linear functionals is extended to the case of a finite union of convex parameter spaces, and the results developed in this paper have important applications to the theory of adaptation.
Abstract: The minimax theory for estimating linear functionals is extended to the case of a finite union of convex parameter spaces. Upper and lower bounds for the minimax risk can still be described in terms of a modulus of continuity. However in contrast to the theory for convex parameter spaces rate optimal procedures are often required to be nonlinear. A construction of such nonlinear procedures is given. The results developed in this paper have important applications to the theory of adaptation.

Posted Content
TL;DR: An approach to estimating the causal effect of a time-varying treatment on time to some event of interest, designed for a situation where the treatment may have been repeatedly adapted to patient characteristics, which themselves may also be time-dependent.
Abstract: In this paper we review an approach to estimating the causal effect of a time-varying treatment on time to some event of interest. This approach is designed for the situation where the treatment may have been repeatedly adapted to patient characteristics, which themselves may also be time-dependent. In this situation the effect of the treatment cannot simply be estimated by conditioning on the patient characteristics, as these may themselves be indicators of the treatment effect. This so-called time-dependent confounding is typical in observational studies. We discuss a new class of failure time models, structural nested failure time models, which can be used to estimate the causal effect of a time-varying treatment, and present methods for estimating and testing the parameters of these models.

Journal ArticleDOI
TL;DR: This paper derived information bounds for the regression parameters in Cox models when data are missing at random using key lemmas appearing in Robins, Rotnitzky and Zhao [J. Amer. Statist. Assoc. 89 (1994) 846-866] and This paper.
Abstract: We derive information bounds for the regression parameters in Cox models when data are missing at random. These calculations are of interest for understanding the behavior of efficient estimation in case-cohort designs, a type of two-phase design often used in cohort studies. The derivations make use of key lemmas appearing in Robins, Rotnitzky and Zhao [J. Amer. Statist. Assoc. 89 (1994) 846-866] and Robins, Hsieh and Newey [J. Roy. Statist. Soc. Ser. B 57 (1995) 409-424], but in a form suited for our purposes here. We begin by summarizing the results of Robins, Rotnitzky and Zhao in a form that leads directly to the projection method which will be of use for our model of interest. We then proceed to derive new information bounds for the regression parameters of the Cox model with data Missing At Random (MAR). In the final section we exemplify our calculations with several models of interest in cohort studies, including an i.i.d. version of the classical case-cohort design of Prentice [Biometrika 73 (1986) 1-11]

Posted Content
TL;DR: In this article, the authors derived simple expressions for the mean residual life in terms of the failure rate for certain classes of distributions which subsume many of the standard cases, and then they developed an expansion for the average residual life of Gaussian probability functions for a broad class of ultimately increasing failure rate distributions.
Abstract: In survival or reliability studies, the mean residual life or life expectancy is an important characteristic of the model. Whereas the failure rate can be expressed quite simply in terms of the mean residual life and its derivative, the inverse problem--namely that of expressing the mean residual life in terms of the failure rate--typically involves an integral of a complicated expression. In this paper, we obtain simple expressions for the mean residual life in terms of the failure rate for certain classes of distributions which subsume many of the standard cases. Several results in the literature can be obtained using our approach. Additionally, we develop an expansion for the mean residual life in terms of Gaussian probability functions for a broad class of ultimately increasing failure rate distributions. Some examples are provided to illustrate the procedure.

Posted Content
TL;DR: It is proved that all the three optimal bounds can be nearly achieved via a single "universal" aggregation procedure, which consists in mixing of the initial estimators with the weights obtained by penalized least squares.
Abstract: This paper studies statistical aggregation procedures in regression setting A motivating factor is the existence of many different methods of estimation, leading to possibly competing estimators We consider here three different types of aggregation: model selection (MS) aggregation, convex (C) aggregation and linear (L) aggregation The objective of (MS) is to select the optimal single estimator from the list; that of (C) is to select the optimal convex combination of the given estimators; and that of (L) is to select the optimal linear combination of the given estimators We are interested in evaluating the rates of convergence of the excess risks of the estimators obtained by these procedures Our approach is motivated by recent minimax results in Nemirovski (2000) and Tsybakov (2003) There exist competing aggregation procedures achieving optimal convergence separately for each one of (MS), (C) and (L) cases Since the bounds in these results are not directly comparable with each other, we suggest an alternative solution We prove that all the three optimal bounds can be nearly achieved via a single "universal" aggregation procedure We propose such a procedure which consists in mixing of the initial estimators with the weights obtained by penalized least squares Two different penalities are considered: one of them is related to hard thresholding techniques, the second one is a data dependent L1-type penalty