scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Statistics in 2010"


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a penalized linear unbiased selection (PLUS) algorithm, which computes multiple exact local minimizers of a possibly nonconvex penalized loss function in a certain main branch of the graph of critical points of the loss.
Abstract: We propose MC+, a fast, continuous, nearly unbiased and accurate method of penalized variable selection in high-dimensional linear regression. The LASSO is fast and continuous, but biased. The bias of the LASSO may prevent consistent variable selection. Subset selection is unbiased but computationally costly. The MC+ has two elements: a minimax concave penalty (MCP) and a penalized linear unbiased selection (PLUS) algorithm. The MCP provides the convexity of the penalized loss in sparse regions to the greatest extent given certain thresholds for variable selection and unbiasedness. The PLUS computes multiple exact local minimizers of a possibly nonconvex penalized loss function in a certain main branch of the graph of critical points of the penalized loss. Its output is a continuous piecewise linear path encompassing from the origin for infinite penalty to a least squares solution for zero penalty. We prove that at a universal penalty level, the MC+ has high probability of matching the signs of the unknowns, and thus correct selection, without assuming the strong irrepresentable condition required by the LASSO. This selection consistency applies to the case of p≫n, and is proved to hold for exactly the MC+ solution among possibly many local minimizers. We prove that the MC+ attains certain minimax convergence rates in probability for the estimation of regression coefficients in lr balls. We use the SURE method to derive degrees of freedom and Cp-type risk estimates for general penalized LSE, including the LASSO and MC+ estimators, and prove their unbiasedness. Based on the estimated degrees of freedom, we propose an estimator of the noise level for proper choice of the penalty level. For full rank designs and general sub-quadratic penalties, we provide necessary and sufficient conditions for the continuity of the penalized LSE. Simulation results overwhelmingly support our claim of superior variable selection properties and demonstrate the computational efficiency of the proposed method.

2,382 citations


Journal ArticleDOI
TL;DR: In this article, a new adaptive kernel density estimator based on linear diffusion processes is proposed, which builds on existing ideas for adaptive smoothing by incorporating information from a pilot density estimate.
Abstract: We present a new adaptive kernel density estimator based on linear diffusion processes. The proposed estimator builds on existing ideas for adaptive smoothing by incorporating information from a pilot density estimate. In addition, we propose a new plug-in bandwidth selection method that is free from the arbitrary normal reference rules used by existing methods. We present simulation examples in which the proposed approach outperforms existing methods in terms of accuracy and reliability.

1,410 citations


Journal ArticleDOI
TL;DR: In this paper, the problem of estimating the graph associated with a binary Ising Markov random field is considered, where the neighborhood of any given node is estimated by performing logistic regression subject to an l 1-constraint.
Abstract: We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on l1-regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an l1-constraint. The method is analyzed under high-dimensional scaling in which both the number of nodes p and maximum neighborhood size d are allowed to grow as a function of the number of observations n. Our main results provide sufficient conditions on the triple (n, p, d) and the model parameters for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. With coherence conditions imposed on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes n=Ω(d3log p) with exponentially decaying error. When these same conditions are imposed directly on the sample matrices, we show that a reduced sample size of n=Ω(d2log p) suffices for the method to estimate neighborhoods consistently. Although this paper focuses on the binary graphical models, we indicate how a generalization of the method of the paper would apply to general discrete Markov random fields.

776 citations


Journal ArticleDOI
TL;DR: In this article, the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression is investigated, and empirical and fully-Bayes approaches to variable selection through examples, theoretical results and simulations are compared.
Abstract: This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham’s-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains.

620 citations


Journal ArticleDOI
TL;DR: In this article, the optimal rates of convergence for estimating the covariance matrix under both the operator norm and Frobenius norm were established and the minimax upper bound was obtained by constructing a special class of tapering estimators and by studying their risk properties.
Abstract: Covariance matrix plays a central role in multivariate statistical analysis. Significant advances have been made recently on developing both theory and methodology for estimating large covariance matrices. However, a minimax theory has yet been developed. In this paper we establish the optimal rates of convergence for estimating the covariance matrix under both the operator norm and Frobenius norm. It is shown that optimal procedures under the two norms are different and consequently matrix estimation under the operator norm is fundamentally different from vector estimation. The minimax upper bound is obtained by constructing a special class of tapering estimators and by studying their risk properties. A key step in obtaining the optimal rate of convergence is the derivation of the minimax lower bound. The technical analysis requires new ideas that are quite different from those used in the more conventional function/sequence estimation problems.

524 citations


Journal ArticleDOI
TL;DR: The result shows that group Lasso is superior to standard Lasso for strongly group-sparse signals, and provides a convincing theoretical justification for using group sparse regularization when the underlying group structure is consistent with the data.
Abstract: This paper develops a theory for group Lasso using a concept called strong group sparsity. Our result shows that group Lasso is superior to standard Lasso for strongly group-sparse signals. This provides a convincing theoretical justification for using group sparse regularization when the underlying group structure is consistent with the data. Moreover, the theory predicts some limitations of the group Lasso formulation that are confirmed by simulation studies.

523 citations


Journal ArticleDOI
TL;DR: The authors proposed a two sample test for means of high dimensional data when the data dimension is much larger than the sample size, which does not require explicit conditions on the relationship between data dimension and sample size.
Abstract: We proposed a two sample test for means of high dimensional data when the data dimension is much larger than the sample size. The classical Hotelling's $T^2$ test does not work for this ``large p, small n" situation. The proposed test does not require explicit conditions on the relationship between the data dimension and sample size. This offers much flexibility in analyzing high dimensional data. An application of the proposed test is in testing significance for sets of genes, which we demonstrate in an empirical study on a Leukemia data set.

474 citations


Journal ArticleDOI
TL;DR: Song et al. as discussed by the authors proposed a more general version of the independent learning with ranking the maximum marginal likelihood estimates in generalized linear models and showed that the proposed methods also possess the sure screening property with vanishing false selection rate, which justifies the applicability of such a simple method in a wide spectrum.
Abstract: Ultrahigh dimensional variable selection plays an increasingly important role in contemporary scientific discoveries and statistical research. Among others, Fan and Lv (2008) propose an independent screening framework by ranking the marginal correlations. They showed that the correlation ranking procedure possesses a sure independence screening property within the context of the linear model with Gaussian covariates and responses. In this paper, we propose a more general version of the independent learning with ranking the maximum marginal likelihood estimates or the maximum marginal likelihood itself in generalized linear models. We show that the proposed methods, with Fan and Lv (2008) as a very special case, also possess the sure screening property with vanishing false selection rate. The conditions under which that the independence learning possesses a sure screening is surprisingly simple. This justifies the applicability of such a simple method in a wide spectrum. We quantify explicitly the extent to which the dimensionality can be reduced by independence screening, which depends on the interactions of the covariance matrix of covariates and true parameters. Simulation studies are used to illustrate the utility of the proposed approaches. In addition, we � Supported in part by Grant NSF grants DMS-0714554 and DMS-0704337. The bulk of the work was conducted when Rui Song was a postdoctoral research fellow at Princeton University. The authors would like to thank the associate editor and two referees for their constructive comments that improve the presentation and the results of the paper. AMS 2000 subject classifications: Primary 68Q32, 62J12; secondary 62E99, 60F10

462 citations


Journal ArticleDOI
TL;DR: In this paper, the adaptive group Lasso was used to select nonzero components in a nonparametric additive model of a conditional mean function, where the additive components are approximated by truncated series expansions with B-spline bases, and the problem of component selection becomes that of selecting the groups of coefficients in the expansion.
Abstract: We consider a nonparametric additive model of a conditional mean function in which the number of variables and additive components may be larger than the sample size but the number of nonzero additive components is "small" relative to the sample size. The statistical problem is to determine which additive components are nonzero. The additive components are approximated by truncated series expansions with B-spline bases. With this approximation, the problem of component selection becomes that of selecting the groups of coefficients in the expansion. We apply the adaptive group Lasso to select nonzero components, using the group Lasso to obtain an initial estimator and reduce the dimension of the problem. We give conditions under which the group Lasso selects a model whose number of components is comparable with the underlying model, and the adaptive group Lasso selects the nonzero components correctly with probability approaching one as the sample size increases and achieves the optimal rate of convergence. The results of Monte Carlo experiments show that the adaptive group Lasso procedure works well with samples of moderate size. A data example is used to illustrate the application of the proposed method.

399 citations


Journal ArticleDOI
TL;DR: In this article, a moment-based notion of dependence for functional time series which involves m-dependence is introduced, and the impact of dependence on several important statistical procedures for functional data is investigated.
Abstract: Functional data often arise from measurements on fine time grids and are obtained by separating an almost continuous time record into natural consecutive intervals, for example, days. The functions thus obtained form a functional time series, and the central issue in the analysis of such data consists in taking into account the temporal dependence of these functional observations. Examples include daily curves of financial transaction data and daily patterns of geophysical and environmental data. For scalar and vector valued stochastic processes, a large number of dependence notions have been proposed, mostly involving mixing type distances between σ-algebras. In time series analysis, measures of dependence based on moments have proven most useful (autocovariances and cumulants). We introduce a moment-based notion of dependence for functional time series which involves m-dependence. We show that it is applicable to linear as well as nonlinear functional time series. Then we investigate the impact of dependence thus quantified on several important statistical procedures for functional data. We study the estimation of the functional principal components, the long-run covariance matrix, change point detection and the functional linear model. We explain when temporal dependence affects the results obtained for i.i.d. functional observations and when these results are robust to weak dependence.

250 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider nonparametric estimation of the mean and covariance functions for functional/longitudinal data and derive almost sure rates of convergence for principal component analysis using the estimated covariance function.
Abstract: We consider nonparametric estimation of the mean and covariance functions for functional/longitudinal data. Strong uniform convergence rates are developed for estimators that are local-linear smoothers. Our results are obtained in a unified framework in which the number of observations within each curve/cluster can be of any rate relative to the sample size. We show that the convergence rates for the procedures depend on both the number of sample curves and the number of observations on each curve. For sparse functional data, these rates are equivalent to the optimal rates in nonparametric regression. For dense functional data, root-n rates of convergence can be achieved with proper choices of bandwidths. We further derive almost sure rates of convergence for principal component analysis using the estimated covariance function. The results are illustrated with simulation studies.

Journal ArticleDOI
TL;DR: In this paper, a new multivariate concept of quantile, based on a directional version of Koenker and Bassett's traditional regression quantiles, is introduced for multivariate location and multiple-output regression problems.
Abstract: A new multivariate concept of quantile, based on a directional version of Koenker and Bassett’s traditional regression quantiles, is introduced for multivariate location and multiple-output regression problems. In their empirical version, those quantiles can be computed efficiently via linear programming techniques. Consistency, Bahadur representation and asymptotic normality results are established. Most importantly, the contours generated by those quantiles are shown to coincide with the classical halfspace depth contours associated with the name of Tukey. This relation does not only allow for efficient depth contour computations by means of parametric linear programming, but also for transferring from the quantile to the depth universe such asymptotic results as Bahadur representations. Finally, linear programming duality opens the way to promising developments in depth-related multivariate rank-based inference.

Journal ArticleDOI
TL;DR: In this paper, it was shown that correlation can be used to improve the performance of higher-criticism in the presence of correlated signals. But, it was also shown that the case of independent noise is the most difficult of all, from a statistical viewpoint, and that more accurate signal detection can be obtained when correlation is present.
Abstract: Higher criticism is a method for detecting signals that are both sparse and weak. Although first proposed in cases where the noise variables are independent, higher criticism also has reasonable performance in settings where those variables are correlated. In this paper we show that, by exploiting the nature of the correlation, performance can be improved by using a modified approach which exploits the potential advantages that correlation has to offer. Indeed, it turns out that the case of independent noise is the most difficult of all, from a statistical viewpoint, and that more accurate signal detection (for a given level of signal sparsity and strength) can be obtained when correlation is present. We characterize the advantages of correlation by showing how to incorporate them into the definition of an optimal detection boundary. The boundary has particularly attractive properties when correlation decays at a polynomial rate or the correlation matrix is Toeplitz.

Journal ArticleDOI
TL;DR: In partially linear single-index models, the semiparametrically efficient profile least-squares estimators of regression coefficients are obtained and a proposed tuning parameter selector, BIC, is demonstrated that identifies the true model consistently.
Abstract: In partially linear single-index models, we obtain the semiparametrically efficient profile least-squares estimators of regression coefficients. We also employ the smoothly clipped absolute deviation penalty (SCAD) approach to simultaneously select variables and estimate regression coefficients. We show that the resulting SCAD estimators are consistent and possess the oracle property. Subsequently, we demonstrate that a proposed tuning parameter selector, BIC, identifies the true model consistently. Finally, we develop a linear hypothesis test for the parametric coefficients and a goodness-of-fit test for the nonparametric component, respectively. Monte Carlo studies are also presented.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a method for the detection of either favored or avoided distances between genomic events along DNA sequences by using a Hawkes' process, which satisfies an oracle inequality even for quite complex families of models.
Abstract: The aim of this paper is to provide a new method for the detection of either favored or avoided distances between genomic events along DNA sequences. These events are modeled by a Hawkes’ process. The biological problem is actually complex enough to need a non asymptotic penalized model selection approach. We provide a theoretical penalty that satisfies an oracle inequality even for quite complex families of models. The consecutive theoretical estimator is

Journal ArticleDOI
TL;DR: In this paper, the authors construct adaptive confidence bands that are honest for all densities in a "generic" subset of the union of t-Holder balls, 0 < t ≤ r, where r is a fixed but arbitrary integer.
Abstract: Given a sample from some unknown continuous density f: ℝ → ℝ, we construct adaptive confidence bands that are honest for all densities in a "generic" subset of the union of t-Holder balls, 0 < t ≤ r, where r is a fixed but arbitrary integer. The exceptional ("nongeneric") set of densities for which our results do not hold is shown to be nowhere dense in the relevant Holder-norm topologies. In the course of the proofs we also obtain limit theorems for maxima of linear wavelet and kernel density estimators, which are of independent interest.

Journal ArticleDOI
TL;DR: In this article, a smoothness regularization method for functional linear regression is proposed to provide a unified treatment for both the prediction and estimation problems, which achieves the optimal rates of convergence for both prediction and estimators under conditions weaker than those for the functional principal components based methods.
Abstract: We study in this paper a smoothness regularization method for functional linear regression and provide a unified treatment for both the prediction and estimation problems. By developing a tool on simultaneous diagonalization of two positive definite kernels, we obtain shaper results on the minimax rates of convergence and show that smoothness regularized estimators achieve the optimal rates of convergence for both prediction and estimation under conditions weaker than those for the functional principal components based methods developed in the literature. Despite the generality of the method of regularization, we show that the procedure is easily implementable. Numerical results are obtained to illustrate the merits of the method and to demonstrate the theoretical developments.

Journal ArticleDOI
TL;DR: In this paper, the problem of multiple kernel learning based on penalized empirical risk minimization is discussed, where the complexity penalty is determined jointly by the empirical L2 norms and the reproducing kernel Hilbert space (RKHS) norms induced by the kernels with a data-driven choice of regularization parameters.
Abstract: The problem of multiple kernel learning based on penalized empirical risk minimization is discussed. The complexity penalty is determined jointly by the empirical L2 norms and the reproducing kernel Hilbert space (RKHS) norms induced by the kernels with a data-driven choice of regularization parameters. The main focus is on the case when the total number of kernels is large, but only a relatively small number of them is needed to represent the target function, so that the problem is sparse. The goal is to establish oracle inequalities for the excess risk of the resulting prediction rule showing that the method is adaptive both to the unknown design distribution and to the sparsity of the problem.

Journal ArticleDOI
TL;DR: In this paper, the authors consider spectral and batch means methods for estimating the variance of the asymptotic normal distribution and establish conditions which guarantee that these estimators are strongly consistent as the simulation effort increases.
Abstract: Calculating a Monte Carlo standard error (MCSE) is an important step in the statistical analysis of the simulation output obtained from a Markov chain Monte Carlo experiment. An MCSE is usually based on an estimate of the variance of the asymptotic normal distribution. We consider spectral and batch means methods for estimating this variance. In particular, we establish conditions which guarantee that these estimators are strongly consistent as the simulation effort increases. In addition, for the batch means and overlapping batch means methods we establish conditions ensuring consistency in the mean-square sense which in turn allows us to calculate the optimal batch size up to a constant of proportionality. Finally, we examine the empirical finite-sample properties of spectral variance and batch means estimators and provide recommendations for practitioners.

Journal ArticleDOI
TL;DR: In this article, the MU-selectors are used to estimate the sparsity pattern of a sparse vector θ ∗ under matrix uncertainty, where the matrix uncertainty is in the fact that X is observed with additive error.
Abstract: We consider the model y = Xθ∗ + ξ, Z = X + Ξ, where the random vector y ∈ ℝn and the random n × p matrix Z are observed, the n × p matrix X is unknown, Ξ is an n × p random noise matrix, ξ ∈ ℝn is a noise independent of Ξ, and θ∗ is a vector of unknown parameters to be estimated. The matrix uncertainty is in the fact that X is observed with additive error. For dimensions p that can be much larger than the sample size n, we consider the estimation of sparse vectors θ∗. Under matrix uncertainty, the Lasso and Dantzig selector turn out to be extremely unstable in recovering the sparsity pattern (i.e., of the set of nonzero components of θ∗), even if the noise level is very small. We suggest new estimators called matrix uncertainty selectors (or, shortly, the MU-selectors) which are close to θ∗ in different norms and in the prediction risk if the restricted eigenvalue assumption on X is satisfied. We also show that under somewhat stronger assumptions, these estimators recover correctly the sparsity pattern.

Journal ArticleDOI
TL;DR: In this article, generalized density-based clustering in which sharply defined clusters such as clusters on lower-dimensional manifolds are allowed was studied and it was shown that accurate clustering is possible even in high dimensions.
Abstract: We study generalized density-based clustering in which sharply defined clusters such as clusters on lower-dimensional manifolds are allowed. We show that accurate clustering is possible even in high dimensions. We propose two data-based methods for choosing the bandwidth and we study the stability properties of density clusters. We show that a simple graph-based algorithm successfully approximates the high density clusters.

Journal ArticleDOI
TL;DR: In this article, the authors consider the spectrum of certain kernel random matrices, in particular n x n matrices whose (i, j)th entry is f (X' i X j/p) or f (∥X i - X j ∥ 2 /p) where p is the dimension of the data, and X i are independent data vectors.
Abstract: We place ourselves in the setting of high-dimensional statistical inference where the number of variables p in a dataset of interest is of the same order of magnitude as the number of observations n. We consider the spectrum of certain kernel random matrices, in particular n x n matrices whose (i, j)th entry is f (X' i X j /p) or f (∥X i - X j ∥ 2 /p) where p is the dimension of the data, and X i are independent data vectors. Here f is assumed to be a locally smooth function. The study is motivated by questions arising in statistics and computer science where these matrices are used to perform, among other things, nonlinear versions of principal component analysis. Surprisingly, we show that in high-dimensions, and for the models we analyze, the problem becomes essentially linear—which is at odds with heuristics sometimes used to justify the usage of these methods. The analysis also highlights certain peculiarities of models widely studied in random matrix theory and raises some questions about their relevance as tools to model high-dimensional data encountered in practice.

Journal ArticleDOI
TL;DR: Presented on October 15, 2018 from 12:00 p.m.) in the Groseclose Building, Room 402, Georgia Institute of Technology (Georgia Tech).
Abstract: We study a class of hypothesis testing problems in which, upon observing the realization of an n-dimensional Gaussian vector, one has to decide whether the vector was drawn from a standard normal distribution or, alternatively, whether there is a subset of the components belonging to a certain given class of sets whose elements have been "contaminated," that is, have a mean different from zero. We establish some general conditions under which testing is possible and others under which testing is hopeless with a small risk. The combinatorial and geometric structure of the class of sets is shown to play a crucial role. The bounds are illustrated on various examples.

Journal ArticleDOI
TL;DR: In this article, the authors investigate the consistency of different bootstrap methods for constructing confldence bands in the class of estimators that converge at rate cube-root n, i.e., the Grenander estimator (see Grenander (1956)), the nonparametric maximum likelihood estimator of an unknown nonincreasing density function f on [0;1), is a prototypical example.
Abstract: In this paper we investigate the (in)-consistency of difierent bootstrap methods for constructing confldence bands in the class of estimators that converge at rate cube-root n. The Grenander estimator (see Grenander (1956)), the nonparametric maximum likelihood estimator of an unknown non-increasing density function f on [0;1), is a prototypical example. We focus on this example and illustrate different approaches of constructing confldence intervals for f(t0), where t0 is an interior point, i.e., 0 < t0 < 1. It is claimed that the bootstrap statistic, when generating bootstrap samples from the empirical distribution functionFn, does not have any weak limit, conditional on the data, in probability. A similar phenomenon is shown to hold when bootstrapping from ~ Fn, the least concave majorant ofFn. We provide a set of su‐cient conditions for the consistency of bootstrap methods in this example. A suitable version of smoothed bootstrap is proposed and shown to be strongly consistent. The m out of n bootstrap method is also proved to be consistent while generating samples from

Journal ArticleDOI
TL;DR: In this paper, a sieve-based nonlinear least squares (NLS) estimator is proposed to estimate constant and time-varying coefficients in nonlinear ODEs.
Abstract: This article considers estimation of constant and time-varying coefficients in nonlinear ordinary differential equation (ODE) models where analytic closed-form solutions are not available. The numerical solution-based nonlinear least squares (NLS) estimator is investigated in this study. A numerical algorithm such as the Runge-Kutta method is used to approximate the ODE solution. The asymptotic properties are established for the proposed estimators considering both numerical error and measurement error. The B-spline is used to approximate the time-varying coefficients, and the corresponding asymptotic theories in this case are investigated under the framework of the sieve approach. Our results show that if the maximum step size of the p-order numerical algorithm goes to zero at a rate faster than n(-1/(p∧4)), the numerical error is negligible compared to the measurement error. This result provides a theoretical guidance in selection of the step size for numerical evaluations of ODEs. Moreover, we have shown that the numerical solution-based NLS estimator and the sieve NLS estimator are strongly consistent. The sieve estimator of constant parameters is asymptotically normal with the same asymptotic co-variance as that of the case where the true ODE solution is exactly known, while the estimator of the time-varying parameter has the optimal convergence rate under some regularity conditions. The theoretical results are also developed for the case when the step size of the ODE numerical solver does not go to zero fast enough or the numerical error is comparable to the measurement error. We illustrate our approach with both simulation studies and clinical data on HIV viral dynamics.

Journal ArticleDOI
TL;DR: In this article, the authors used Australian Research Council Discovery Grant DP0559465 and by the Israel Science Foundation Grant 666/06 to investigate the effect of genetic mutations on cancer.
Abstract: Supported in part by Australian Research Council Discovery Grant DP0559465 and by Israel Science Foundation Grant 666/06.

Journal ArticleDOI
TL;DR: In this paper, coordinate-independent sparse estimation (CISE) is proposed to simultaneously achieve sparse sufficient dimension reduction and screen out irrelevant and redundant variables efficiently, which results in a Grassmann manifold optimization problem and a fast algorithm is suggested.
Abstract: Sufficient dimension reduction (SDR) in regression, which reduces the dimension by replacing original predictors with a minimal set of their linear combinations without loss of information, is very helpful when the number of predictors is large. The standard SDR methods suffer because the estimated linear combinations usually consist of all original predictors, making it difficult to interpret. In this paper, we propose a unified method— coordinate-independent sparse estimation (CISE)-that can simultaneously achieve sparse sufficient dimension reduction and screen out irrelevant and redundant variables efficiently. CISE is subspace oriented in the sense that it incorporates a coordinate-independent penalty term with a broad series of model-based and model-free SDR approaches. This results in a Grassmann manifold optimization problem and a fast algorithm is suggested. Under mild conditions, based on manifold theories and techniques, it can be shown that CISE would perform asymptotically as well as if the true irrelevant predictors were known, which is referred to as the oracle property. Simulation studies and a real-data example demonstrate the effectiveness and efficiency of the proposed approach.

Journal ArticleDOI
TL;DR: In this paper, the authors established monotonic convergence for a general class of multiplicative algorithms introduced by Silvey, Titterington and Torsney [Comm. Statist. 14 (1978) 1379−1389] for computing optimal designs.
Abstract: Monotonic convergence is established for a general class of multiplicative algorithms introduced by Silvey, Titterington and Torsney [Comm. Statist. Theory Methods 14 (1978) 1379–1389] for computing optimal designs. A conjecture of Titterington [Appl. Stat. 27 (1978) 227–234] is confirmed as a consequence. Optimal designs for logistic regression are used as an illustration.

Journal ArticleDOI
TL;DR: In this paper, a general sequentially rejective multiple testing procedure is presented and a general familywise error controlling method is constructed as a special case of this procedure, among which are the procedures of Holm, Shaffer and Hochberg, parallel and serial gatekeeping, modern procedures for multiple testing in graphs, resampling-based multiple testing procedures and even the closed testing and partitioning procedures themselves.
Abstract: Closed testing and partitioning are recognized as fundamental principles of familywise error control. In this paper, we argue that sequential rejection can be considered equally fundamental as a general principle of multiple testing. We present a general sequentially rejective multiple testing procedure and show that many well-known familywise error controlling methods can be constructed as special cases of this procedure, among which are the procedures of Holm, Shaffer and Hochberg, parallel and serial gatekeeping procedures, modern procedures for multiple testing in graphs, resampling-based multiple testing procedures and even the closed testing and partitioning procedures themselves. We also give a general proof that sequentially rejective multiple testing procedures strongly control the familywise error if they fulfill simple criteria of monotonicity of the critical values and a limited form of weak familywise error control in each single step. The sequential rejection principle gives a novel theoretical perspective on many well-known multiple testing procedures, emphasizing the sequential aspect. Its main practical usefulness is for the development of multiple testing procedures for null hypotheses, possibly logically related, that are structured in a graph. We illustrate this by presenting a uniform improvement of a recently published procedure.

Journal ArticleDOI
TL;DR: In this article, the authors developed the notion of density when functional data are considered in the space determined by the eigenfunctions of principal component analysis, which leads to a transparent and meaningful surrogate for density defined in terms of the average value of the logarithms of the densities of the distributions of principal components for a given dimension.
Abstract: The notion of probability density for a random function is not as straightforward as in finite-dimensional cases. While a probability density function generally does not exist for functional data, we show that it is possible to develop the notion of density when functional data are considered in the space determined by the eigenfunctions of principal component analysis. This leads to a transparent and meaningful surrogate for density defined in terms of the average value of the logarithms of the densities of the distributions of principal components for a given dimension. This density approximation is estimable readily from data. It accurately represents, in a monotone way, key features of small-ball approximations to density. Our results on estimators of the densities of principal component scores are also of independent interest; they reveal interesting shape differences that have not previously been considered. The statistical implications of these results and properties are identified and discussed, and practical ramifications are illustrated in numerical work.