scispace - formally typeset
Search or ask a question

Showing papers in "Journal of The Royal Statistical Society Series B-statistical Methodology in 2014"


Journal ArticleDOI
TL;DR: Covariate balancing propensity score (CBPS) as mentioned in this paper was proposed to improve the empirical performance of propensity score matching and weighting methods by exploiting the dual characteristics of the propensity score as a covariate balancing score and the conditional probability of treatment assignment.
Abstract: The propensity score plays a central role in a variety of causal inference settings. In particular, matching and weighting methods based on the estimated propensity score have become increasingly common in observational studies. Despite their popularity and theoretical appeal, the main practical difficulty of these methods is that the propensity score must be estimated. Researchers have found that slight misspecification of the propensity score model can result in substantial bias of estimated treatment effects. This workshop introduces a simple and yet powerful new methodology, covariate balancing propensity score (CBPS) estimation, which significantly improves the empirical performance of propensity score methods. The CBPS simultaneously optimizes the covariate balance and the prediction of treatment assignment by exploiting the dual characteristics of the propensity score as a covariate balancing score and the conditional probability of treatment assignment. The CBPS is shown to dramatically improve the poor empirical performance of propensity score matching and weighting methods reported in the literature. In addition, the CBPS can be extended to a number of other important settings, including the estimation of the generalized propensity score for non-binary treatments, the generalization of experimental estimates to a target population, and causal inference in the longitudinal settings with marginal structural models. The open-source R package, CBPS, is available for implementing the proposed methods.

963 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a method to construct confidence intervals for individual coefficients and linear combinations of several of them in a linear regression model by turning the regression data into an approximate Gaussian sequence of point estimators of individual regression coefficients.
Abstract: Summary The purpose of this paper is to propose methodologies for statistical inference of low dimensional parameters with high dimensional data. We focus on constructing confidence intervals for individual coefficients and linear combinations of several of them in a linear regression model, although our ideas are applicable in a much broader context. The theoretical results that are presented provide sufficient conditions for the asymptotic normality of the proposed estimators along with a consistent estimator for their finite dimensional covariance matrices. These sufficient conditions allow the number of variables to exceed the sample size and the presence of many small non-zero coefficients. Our methods and theory apply to interval estimation of a preconceived regression coefficient or contrast as well as simultaneous interval estimation of many regression coefficients. Moreover, the method proposed turns the regression data into an approximate Gaussian sequence of point estimators of individual regression coefficients, which can be used to select variables after proper thresholding. The simulation results that are presented demonstrate the accuracy of the coverage probability of the confidence intervals proposed as well as other desirable properties, strongly supporting the theoretical results.

892 citations


Journal ArticleDOI
TL;DR: The joint graphical lasso is proposed, which borrows strength across the classes to estimate multiple graphical models that share certain characteristics, such as the locations or weights of non‐zero edges, based on maximizing a penalized log‐likelihood.
Abstract: We consider the problem of estimating multiple related Gaussian graphical models from a high-dimensional data set with observations belonging to distinct classes. We propose the joint graphical lasso, which borrows strength across the classes in order to estimate multiple graphical models that share certain characteristics, such as the locations or weights of nonzero edges. Our approach is based upon maximizing a penalized log likelihood. We employ generalized fused lasso or group lasso penalties, and implement a fast ADMM algorithm to solve the corresponding convex optimization problems. The performance of the proposed method is illustrated through simulated and real data examples.

757 citations


Journal ArticleDOI
TL;DR: In this paper, the essentials of our paper of 2002 are briefly summarized and compared with other criteria for model comparison, after some comments on the paper's reception and influence, we consider criticisms and proposals forimprovement made by us and others.
Abstract: Summary The essentials of our paper of 2002 are briefly summarized and compared with other criteria for model comparison. After some comments on the paper's reception and influence, we consider criticisms and proposals forimprovement made by us and others.

521 citations


Journal ArticleDOI
TL;DR: In this article, a discrete-time generative model for social network evolution is proposed, which inherits the richness and flexibility of the class of exponential-family random graph models.
Abstract: Models of dynamic networks - networks that evolve over time - have manifold applications. We develop a discrete-time generative model for social network evolution that inherits the richness and flexibility of the class of exponential-family random graph models. The model - a Separable Temporal ERGM (STERGM) - facilitates separable modeling of the tie duration distributions and the structural dynamics of tie formation. We develop likelihood-based inference for the model, and provide computational algorithms for maximum likelihood estimation. We illustrate the interpretability of the model in analyzing a longitudinal network of friendship ties within a school.

326 citations


Journal ArticleDOI
TL;DR: The ‘bag of little bootstraps’ (BLB) is introduced, which is a new procedure which incorporates features of both the bootstrap and subsampling to yield a robust, computationally efficient means of assessing the quality of estimators.
Abstract: Summary The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large data sets—which are increasingly prevalent—the calculation of bootstrap-based quantities can be prohibitively demanding computationally. Although variants such as subsampling and the m out of n bootstrap can be used in principle to reduce the cost of bootstrap computations, these methods are generally not robust to specification of tuning parameters (such as the number of subsampled data points), and they often require knowledge of the estimator's convergence rate, in contrast with the bootstrap. As an alternative, we introduce the ‘bag of little bootstraps’ (BLB), which is a new procedure which incorporates features of both the bootstrap and subsampling to yield a robust, computationally efficient means of assessing the quality of estimators. The BLB is well suited to modern parallel and distributed computing architectures and furthermore retains the generic applicability and statistical efficiency of the bootstrap. We demonstrate the BLB's favourable statistical performance via a theoretical analysis elucidating the procedure's properties, as well as a simulation study comparing the BLB with the bootstrap, the m out of n bootstrap and subsampling. In addition, we present results from a large-scale distributed implementation of the BLB demonstrating its computational superiority on massive data, a method for adaptively selecting the BLB's tuning parameters, an empirical study applying the BLB to several real data sets and an extension of the BLB to time series data.

318 citations


Journal ArticleDOI
TL;DR: A new estimator, the simultaneous multiscale change point estimator SMUCE, is introduced, which achieves the optimal detection rate of vanishing signals as n→∞, even for an unbounded number of change points.
Abstract: Summary We introduce a new estimator, the simultaneous multiscale change point estimator SMUCE, for the change point problem in exponential family regression. An unknown step function is estimated by minimizing the number of change points over the acceptance region of a multiscale test at a level α. The probability of overestimating the true number of change points K is controlled by the asymptotic null distribution of the multiscale test statistic. Further, we derive exponential bounds for the probability of underestimating K. By balancing these quantities, α will be chosen such that the probability of correctly estimating K is maximized. All results are even non-asymptotic for the normal case. On the basis of these bounds, we construct (asymptotically) honest confidence sets for the unknown step function and its change points. At the same time, we obtain exponential bounds for estimating the change point locations which for example yield the minimax rate up to a log-term. Finally, the simultaneous multiscale change point estimator achieves the optimal detection rate of vanishing signals as n∞, even for an unbounded number of change points. We illustrate how dynamic programming techniques can be employed for efficient computation of estimators and confidence regions. The performance of the multiscale approach proposed is illustrated by simulations and in two cutting edge applications from genetic engineering and photoemission spectroscopy.

296 citations


Journal ArticleDOI
TL;DR: A new test statistic is introduced that is based on a linear transformation of the data by the precision matrix which incorporates the correlations between the variables and is shown to be particularly powerful against sparse alternatives and enjoys certain optimality.
Abstract: Summary. The paper considers in the high dimensional setting a canonical testing problem in multivariate analysis, namely testing the equality of two mean vectors. We introduce a new test statistic that is based on a linear transformation of the data by the precision matrix which incorporates the correlations between the variables. The limiting null distribution of the test statistic and the power of the test are analysed. It is shown that the test is particularly powerful against sparse alternatives and enjoys certain optimality. A simulation study is carried out to examine the numerical performance of the test and to compare it with other tests given in the literature. The results show that the test proposed significantly outperforms those tests in a range of settings.

250 citations


Journal ArticleDOI
TL;DR: In this article, a pairwise censored likelihood is used for consistent estimation of the extremes of space-time data under mild mixing conditions, and illustrates this by fitting an extension of a model of Schlather (2002) to hourly rainfall data.
Abstract: Max-stable processes are the natural analogues of the generalized extreme-value distribution when modelling extreme events in space and time. Under suitable conditions, these processes are asymptotically justified models for maxima of independent replications of random fields, and they are also suitable for the modelling of extreme measurements over high thresholds. This paper shows how a pairwise censored likelihood can be used for consistent estimation of the extremes of space-time data under mild mixing conditions, and illustrates this by fitting an extension of a model of Schlather (2002) to hourly rainfall data. A block bootstrap procedure is used for uncertainty assessment. Estimator efficiency is considered and the choice of pairs to be included in the pairwise likelihood is discussed. The proposed model fits the data better than some natural competitors.

184 citations


Journal ArticleDOI
TL;DR: In this paper, a class of regularized matrix regression methods based on spectral regularization is proposed, and a highly efficient and scalable estimation algorithm is developed to facilitate model selection along the regularization path.
Abstract: Modern technologies are producing a wealth of data with complex structures. For instance, in two-dimensional digital imaging, flow cytometry and electroencephalography, matrix-type covariates frequently arise when measurements are obtained for each combination of two underlying variables. To address scientific questions arising from those data, new regression methods that take matrices as covariates are needed, and sparsity or other forms of regularization are crucial owing to the ultrahigh dimensionality and complex structure of the matrix data. The popular lasso and related regularization methods hinge on the sparsity of the true signal in terms of the number of its non-zero coefficients. However, for the matrix data, the true signal is often of, or can be well approximated by, a low rank structure. As such, the sparsity is frequently in the form of low rank of the matrix parameters, which may seriously violate the assumption of the classical lasso. We propose a class of regularized matrix regression methods based on spectral regularization. A highly efficient and scalable estimation algorithm is developed, and a degrees-of-freedom formula is derived to facilitate model selection along the regularization path. Superior performance of the method proposed is demonstrated on both synthetic and real examples.

162 citations


Journal ArticleDOI
TL;DR: A non‐parametric model to describe the dynamics of multicomponent periodicity is proposed and the recently developed synchro‐squeezing transform is investigated in extracting these features in the presence of a trend and heteroscedastic dependent errors.
Abstract: Summary Periodicity and trend are features describing an observed sequence, and extracting these features is an important issue in many scientific fields. However, it is not an easy task for existing methods to analyse simultaneously the trend and dynamics of the periodicity such as time varying frequency and amplitude, and the adaptivity of the analysis to such dynamics and robustness to heteroscedastic dependent errors are not guaranteed. These tasks become even more challenging when there are multiple periodic components. We propose a non-parametric model to describe the dynamics of multicomponent periodicity and investigate the recently developed synchro-squeezing transform in extracting these features in the presence of a trend and heteroscedastic dependent errors. The identifiability problem of the non-parametric periodicity model is studied, and the adaptivity and robustness properties of the synchro-squeezing transform are theoretically justified in both discrete and continuous time settings. Consequently we have a new technique for decoupling the trend, periodicity and heteroscedastic, dependent error process in a general non-parametric set-up. Results of a series of simulations are provided, and the incidence time series of varicella and herpes zoster in Taiwan and respiratory signals observed from a sleep study are analysed.

Journal ArticleDOI
TL;DR: This work derives necessary and sufficient conditions on summary statistics for the corresponding Bayes factor to be convergent, namely to select the true model asymptotically under the two models.
Abstract: The choice of the summary statistics that are used in Bayesian inference and in particular in approximate Bayesian computation algorithms has bearings on the validation of the resulting inference. Those statistics are nonetheless customarily used in approximate Bayesian computation algorithms without consistency checks. We derive necessary and sufficient conditions on summary statistics for the corresponding Bayes factor to be convergent, namely to select the true model asymptotically. Those conditions, which amount to the expectations of the summary statistics differing asymptotically under the two models, are quite natural and can be exploited in approximate Bayesian computation settings to infer whether or not a choice of summary statistics is appropriate, via a Monte Carlo validation.

Journal ArticleDOI
TL;DR: The Bayesian bridge model outperforms its classical cousin in estimation and prediction across a variety of data sets, both simulated and real, and the Markov chain Monte Carlo algorithm for fitting the bridge model exhibits excellent mixing properties, particularly for the global scale parameter.
Abstract: Summary. We propose the Bayesian bridge estimator for regularized regression and classification. Two key mixture representations for the Bayesian bridge model are developed: a scale mixture of normal distributions with respect to an α-stable random variable; a mixture of Bartlett– Fejer kernels (or triangle densities) with respect to a two-component mixture of gamma random variables. Both lead to Markov chain Monte Carlo methods for posterior simulation, and these methods turn out to have complementary domains of maximum efficiency. The first representation is a well-known result due to West and is the better choice for collinear design matrices. The second representation is new and is more efficient for orthogonal problems, largely because it avoids the need to deal with exponentially tilted stable random variables. It also provides insight into the multimodality of the joint posterior distribution, which is a feature of the bridge model that is notably absent under ridge or lasso-type priors. We prove a theorem that extends this representation to a wider class of densities representable as scale mixtures of beta distributions, and we provide an explicit inversion formula for the mixing distribution. The connections with slice sampling and scale mixtures of normal distributions are explored. On the practical side, we find that the Bayesian bridge model outperforms its classical cousin in estimation and prediction across a variety of data sets, both simulated and real. We also show that the Markov chain Monte Carlo algorithm for fitting the bridge model exhibits excellent mixing properties, particularly for the global scale parameter. This makes for a favourable contrast with analogous Markov chain Monte Carlo algorithms for other sparse Bayesian models. All methods described in this paper are implemented in the R package BayesBridge. An extensive set of simulation results is provided in two on-line supplemental files.

Journal ArticleDOI
TL;DR: A new prediction band is given by combining the idea of ‘conformal prediction’ with non‐parametric conditional density estimation and the proposed estimator, called COPS, always has a finite sample guarantee.
Abstract: Summary We study distribution-free, non-parametric prediction bands with a focus on their finite sample behaviour. First we investigate and develop different notions of finite sample coverage guarantees. Then we give a new prediction band by combining the idea of ‘conformal prediction’ with non-parametric conditional density estimation. The proposed estimator, called COPS (conformal optimized prediction set), always has a finite sample guarantee. Under regularity conditions the estimator converges to an oracle band at a minimax optimal rate. A fast approximation algorithm and a data-driven method for selecting the bandwidth are developed. The method is illustrated in simulated and real data examples.

Journal ArticleDOI
TL;DR: In this article, the authors formulate the concern about selective inference in its generality, for a very wide class of error rates and for any selection criterion, and present an adjustment of the testing level inside the selected families that retains control of the expected average error over the selected family.
Abstract: Summary In many complex multiple-testing problems the hypotheses are divided into families. Given the data, families with evidence for true discoveries are selected, and hypotheses within them are tested. Neither controlling the error rate in each family separately nor controlling the error rate over all hypotheses together can assure some level of confidence about the filtration of errors within the selected families. We formulate this concern about selective inference in its generality, for a very wide class of error rates and for any selection criterion, and present an adjustment of the testing level inside the selected families that retains control of the expected average error over the selected families.

Journal ArticleDOI
TL;DR: In this paper, a class of semiparametric regression models is proposed, where transformation functions are estimated by regularised optimisation of scoring rules for probabilistic forecasts, e.g. the continuous ranked probability score.
Abstract: The ultimate goal of regression analysis is to obtain information about the conditional distribution of a response given a set of explanatory variables. This goal is, however, seldom achieved because most established regression models only estimate the conditional mean as a function of the explanatory variables and assume that higher moments are not aected by the regressors. The underlying reason for such a restriction is the assumption of additivity of signal and noise. We propose to relax this common assumption in the framework of transformation models. The novel class of semiparametric regression models proposed herein allows transformation functions to depend on explanatory variables. These transformation functions are estimated by regularised optimisation of scoring rules for probabilistic forecasts, e.g. the continuous ranked probability score. The corresponding estimated conditional distribution functions are consistent. Based on applications from dierent domains, we show that these conditional transformation models are useful for describing possible heteroscedasticity, comparing spatially varying distributions, identifying extreme events, deriving prediction intervals and selecting variables beyond mean regression eects. An empirical investigation based on a heteroscedastic varying coecient simulation model demonstrates that semiparametric estimation of conditional distribution functions can be more benecial than kernel-based non-parametric approaches or parametric generalised additive models for location, scale and shape.

Journal ArticleDOI
TL;DR: Simulations based on realistic outlier-contaminated data show that the bias correction proposed often leads to more efficient estimators, and the mean-squared error estimation methods proposed appear to perform well with a variety of outlier robust small area estimators.
Abstract: Recently proposed outlier robust small area estimators can be substantially biased when outliers are drawn from a distribution that has a different mean from that of the rest of the survey data. This naturally leads down to the idea of an outlier robust bias correction for these estimators. In this paper we develop this idea and also propose two different analytical mean squared error estimators for the ensuring bias corrected outlier robust estimators. Simulations based on realistic outlier contaminated data show that the proposed bias correction often leads to more efficient estimators. Furthermore the proposed mean squared error estimators appear to perform well with a variety of outlier robust smal area estimators.

Journal ArticleDOI
TL;DR: It is established conditions under which the estimation error of the unknown threshold parameter can be bounded by a factor that is nearly n−1 even when the number of regressors can be much larger than the sample size n.
Abstract: We consider a high dimensional regression model with a possible change point due to a covariate threshold and develop the lasso estimator of regression coefficients as well as the threshold parameter. Our lasso estimator not only selects covariates but also selects a model between linear and threshold regression models. Under a sparsity assumption, we derive non-asymptotic oracle inequalities for both the prediction risk and the l1-estimation loss for regression coefficients. Since the lasso estimator selects variables simultaneously, we show that oracle inequalities can be established without pretesting the existence of the threshold effect. Furthermore, we establish conditions under which the estimation error of the unknown threshold parameter can be bounded by a factor that is nearly n-1 even when the number of regressors can be much larger than the sample size n. We illustrate the usefulness of our proposed estimation method via Monte Carlo simulations and an application to real data.

Journal ArticleDOI
TL;DR: In this article, the authors derive novel asymptotic expansions of the two well-known principles in misspecified generalized linear models, which give the generalized Bayesian information criterion and generalized Akaike information criterion.
Abstract: Summary Model selection is of fundamental importance to high dimensional modelling featured in many contemporary applications. Classical principles of model selection include the Bayesian principle and the Kullback–Leibler divergence principle, which lead to the Bayesian information criterion and Akaike information criterion respectively, when models are correctly specified. Yet model misspecification is unavoidable in practice. We derive novel asymptotic expansions of the two well-known principles in misspecified generalized linear models, which give the generalized Bayesian information criterion and generalized Akaike information criterion. A specific form of prior probabilities motivated by the Kullback–Leibler divergence principle leads to the generalized Bayesian information criterion with prior probability, GBICp, which can be naturally decomposed as the sum of the negative maximum quasi-log-likelihood, a penalty on model dimensionality, and a penalty on model misspecification directly. Numerical studies demonstrate the advantage of the new methods for model selection in both correctly specified and misspecified models.

Journal ArticleDOI
TL;DR: A fast, well performing and theoretically tractable method for detecting multiple change points in the structure of an auto‐regressive conditional heteroscedastic model for financial returns with piecewise constant parameter values is proposed.
Abstract: The emergence of the recent financial crisis, during which markets frequently underwent changes in their statistical structure over a short period of time, illustrates the importance of non-stationary modelling in financial time series. Motivated by this observation, we propose a fast, well performing and theoretically tractable method for detecting multiple change points in the structure of an auto-regressive conditional heteroscedastic model for financial returns with piecewise constant parameter values. Our method, termed BASTA (binary segmentation for transformed auto-regressive conditional heteroscedasticity), proceeds in two stages: process transformation and binary segmentation. The process transformation decorrelates the original process and lightens its tails; the binary segmentation consistently estimates the change points. We propose and justify two particular transformations and use simulation to fine-tune their parameters as well as the threshold parameter for the binary segmentation stage. A comparative simulation study illustrates good performance in comparison with the state of the art, and the analysis of the Financial Times Stock Exchange FTSE 100 index reveals an interesting correspondence between the estimated change points and major events of the recent financial crisis. Although the method is easy to implement, ready-made R software is provided.

Journal ArticleDOI
TL;DR: In this article, the authors discuss the nonparametric identification of causal direct and indirect effects of a binary treatment based on instrumental variables and demonstrate the effects of education on health, which may be mediated by income, and (ii) of the Job Corps training program.
Abstract: This paper discusses the nonparametric identification of causal direct and indirect effects of a binary treatment based on instrumental variables. We identify the indirect effect, which operates through a mediator (i.e. intermediate variable) that is situated on the causal path between the treatment and the outcome, as well as the unmediated direct effect of the treatment using distinct instruments for the endogenous treatment and the endogenous mediator. We examine different settings to obtain nonparametric identification of (natural) direct and indirect as well as controlled direct effects for continuous and discrete mediators and continuous and discrete instruments. We illustrate our approach in two applications: to disentangle the effects (i) of education on health, which may be mediated by income, and (ii) of the Job Corps training program, which may affect earnings indirectly via working longer hours and directly via higher wages per hour.

Journal ArticleDOI
TL;DR: A new regularization framework for structure estimation in the context of reproducing kernel Hilbert spaces is proposed by penalized least squares using a penalty which encourages the sparse structure of the additive components.
Abstract: Summary. Functional additive models provide a flexible yet simple framework for regressions involving functional predictors. The utilization of a data-driven basis in an additive rather than linear structure naturally extends the classical functional linear model. However, the critical issue of selecting non-linear additive components has been less studied. In this work, we propose a new regularization framework for structure estimation in the context of reproducing kernel Hilbert spaces. The approach proposed takes advantage of functional principal components which greatly facilitates implementation and theoretical analysis. The selection and estimation are achieved by penalized least squares using a penalty which encourages the sparse structure of the additive components. Theoretical properties such as the rate of convergence are investigated. The empirical performance is demonstrated through simulation studies and a real data application.

Journal ArticleDOI
TL;DR: In this article, a family of metrics that are monotonic transformation invariant has been introduced, based on densities or cumulative distribution functions, and a new generalized family of metric families has been proposed.
Abstract: Summary Monotonic transformations are widely employed in statistics and data analysis. In computer experiments they are often used to gain accuracy in the estimation of global sensitivity statistics. However, one faces the question of interpreting results that are obtained on the transformed data back on the original data. The situation is even more complex in computer experiments, because transformations alter the model input–output mapping and distort the estimators. This work demonstrates that the problem can be solved by utilizing statistics which are monotonic transformation invariant. To do so, we offer an investigation into the families of metrics either based on densities or on cumulative distribution functions that are monotonic transformation invariant and we introduce a new generalized family of metrics. Numerical experiments show that transformations allow numerical convergence in the estimates of global sensitivity statistics, both invariant and not, in cases in which it would otherwise be impossible to obtain convergence. However, one fully exploits the increased numerical accuracy if the global sensitivity statistic is monotonic transformation invariant. Conversely, estimators of measures that do not have this invariance property might lead to misleading deductions.

Journal ArticleDOI
TL;DR: It is shown how one variant of generalized α‐investing can be used to control mFDR in a quality preserving database and to lead to significant reduction in costs compared with naive approaches for controlling the familywise error rate implemented by Aharoni and co‐workers.
Abstract: Summary The increasing prevalence and utility of large public databases necessitates the development of appropriate methods for controlling false discovery. Motivated by this challenge, we discuss the generic problem of testing a possibly infinite stream of null hypotheses. In this context, Foster and Stine suggested a novel method named α-investing for controlling a false discovery measure known as mFDR. We develop a more general procedure for controlling mFDR, of which α-investing is a special case. We show that, in common practical situations, the general procedure can be optimized to produce an expected reward optimal version, which is more powerful than α-investing. We then present the concept of quality preserving databases which was originally introduced by Aharoni and co-workers, which formalizes efficient public database management to save costs and to control false discovery simultaneously. We show how one variant of generalized α-investing can be used to control mFDR in a quality preserving database and to lead to significant reduction in costs compared with naive approaches for controlling the familywise error rate implemented by Aharoni and co-workers.

Journal ArticleDOI
TL;DR: In this paper, the identiability of the number of components in k-variate, M-component-nite mixture models was analyzed and a procedure was developed to consistently estimate a lower bound.
Abstract: This article analyzes the identiability of the number of components in k-variate, Mcomponent nite mixture models in which each component distribution has independent marginals, including models in latent class analysis. Without making parametric assumptions on the component distributions, we investigate how one can identify the number of components from the distribution function of the observed data. When k 2, a lower bound on the number of components (M) is nonparametrically identiable from the rank of a matrix constructed from the distribution function of the observed variables. Building on this identication condition, we develop a procedure to consistently estimate a lower bound on the number of components.

Journal ArticleDOI
TL;DR: In this article, a treatment effect cross-validation approach is proposed to estimate the effect of a treatment on a response with respect to certain baseline covariates in a clinical trial of patients with human immunodeficiency virus.
Abstract: Summary Researchers often believe that a treatment's effect on a response may be heterogeneous with respect to certain baseline covariates. This is an important premise of personalized medicine. Several methods for estimating heterogeneous treatment effects have been proposed. However, little attention has been given to the problem of choosing between estimators of treatment effects. Models that best estimate the regression function may not be best for estimating the effect of a treatment; therefore, there is a need for model selection methods that are targeted to treatment effect estimation. We demonstrate an application of the focused information criterion in this setting and develop a treatment effect cross-validation aimed at minimizing treatment effect estimation errors. Theoretically, treatment effect cross-validation has a model selection consistency property when the data splitting ratio is properly chosen. Practically, treatment effect cross-validation has the flexibility to compare different types of models. We illustrate the methods by using simulation studies and data from a clinical trial comparing treatments of patients with human immunodeficiency virus.

Journal ArticleDOI
TL;DR: In this article, the authors consider sparse regression with a hard thresholding penalty, which gives rise to thresholded regression and identify the optimal choice of the ridge parameter, which is shown to have simultaneous advantages to both the L2-loss and the prediction loss.
Abstract: Summary. High dimensional sparse modelling via regularization provides a powerful tool for analysing large-scale data sets and obtaining meaningful interpretable models. The use of nonconvex penalty functions shows advantage in selecting important features in high dimensions, but the global optimality of such methods still demands more understanding.We consider sparse regression with a hard thresholding penalty, which we show to give rise to thresholded regression. This approach is motivated by its close connection with L0-regularization, which can be unrealistic to implement in practice but of appealing sampling properties, and its computational advantage. Under some mild regularity conditions allowing possibly exponentially growing dimensionality, we establish the oracle inequalities of the resulting regularized estimator, as the global minimizer, under various prediction and variable selection losses, as well as the oracle risk inequalities of the hard thresholded estimator followed by further L2-regularization. The risk properties exhibit interesting shrinkage effects under both estimation and prediction losses. We identify the optimal choice of the ridge parameter, which is shown to have simultaneous advantages to both the L2-loss and the prediction loss. These new results and phenomena are evidenced by simulation and real data examples.

Journal ArticleDOI
TL;DR: In this article, a functional linear manifold model was proposed for the analysis of longitudinally observed behavioural patterns of flying, feeding, walking and resting over the lifespan of Drosophila flies and also investigated in simulations.
Abstract: Summary Multivariate functional data are increasingly encountered in data analysis, whereas statistical models for such data are not well developed yet. Motivated by a case-study where one aims to quantify the relationship between various longitudinally recorded behaviour intensities for Drosophila flies, we propose a functional linear manifold model. This model reflects the functional dependence between the components of multivariate random processes and is defined through data-determined linear combinations of the multivariate component trajectories, which are characterized by a set of varying-coefficient functions. The time varying linear relationships that govern the components of multivariate random functions yield insights about the underlying processes and also lead to noise-reduced representations of the multivariate component trajectories. The functional linear manifold model proposed is put to the task for an analysis of longitudinally observed behavioural patterns of flying, feeding, walking and resting over the lifespan of Drosophila flies and is also investigated in simulations.

Journal ArticleDOI
TL;DR: In this article, the estimation efficiency of the central mean subspace in the framework of sufficient dimension reduction is investigated and the semiparametric efficient score is derived and studied in practice.
Abstract: Summary We investigate the estimation efficiency of the central mean subspace in the framework of sufficient dimension reduction. We derive the semiparametric efficient score and study its practical applicability. Despite the difficulty caused by the potential high dimension issue in the variance component, we show that locally efficient estimators can be constructed in practice. We conduct simulation studies and a real data analysis to demonstrate the finite sample performance and gain in efficiency of the proposed estimators in comparison with several existing methods.

Journal ArticleDOI
TL;DR: In this article, the bias reducing adjusted score equations of Firth in 1993 are obtained, whose solution ensures an estimator with smaller asymptotic bias than the maximum likelihood estimator.
Abstract: For the estimation of cumulative link models for ordinal data, the bias reducing adjusted score equations of Firth in 1993 are obtained, whose solution ensures an estimator with smaller asymptotic bias than the maximum likelihood estimator. Their form suggests a parameter-dependent adjustment of the multinomial counts, which in turn suggests the solution of the adjusted score equations through iterated maximum likelihood fits on adjusted counts, greatly facilitating implementation. Like the maximum likelihood estimator, the reduced bias estimator is found to respect the invariance properties that make cumulative link models a good choice for the analysis of categorical data. Its additional finiteness and optimal frequentist properties, along with the adequate behaviour of related asymptotic inferential procedures, make the reduced bias estimator attractive as a default choice for practical applications. Furthermore, the estimator proposed enjoys certain shrinkage properties that are defensible from an experimental point of view relating to the nature of ordinal data.