scispace - formally typeset
Search or ask a question

Showing papers in "Biometrika in 2009"


Journal ArticleDOI
TL;DR: In this paper, the authors developed a systematic approach to address the lack of overlap in the covariate distributions between treatment groups, which can lead to imprecise estimates and can make commonly used estimators sensitive to the choice of specification.
Abstract: SUMMARY Estimation of average treatment effects under unconfounded or ignorable treatment assignment is often hampered by lack of overlap in the covariate distributions between treatment groups. This lack of overlap can lead to imprecise estimates, and can make commonly used estimators sensitive to the choice of specification. In such cases researchers have often used ad hoc methods for trimming the sample. We develop a systematic approach to addressing lack of overlap. We characterize optimal subsamples for which the average treatment effect can be estimated most precisely. Under some conditions, the optimal selection rules depend solely on the propensity score. For a wide range of distributions, a good approximation to the optimal rule is provided by the simple rule of thumb to discard all units with estimated propensity scores outside the range [0·1, 0·9].

847 citations


Journal ArticleDOI
TL;DR: Sequential techniques can enhance the efficiency of the approximate Bayesian computation algorithm, as in Sisson et al.'s (2007) partial rejection control version, which compares favourably with two other versions of the approximation algorithm.
Abstract: Sequential techniques can enhance the efficiency of the approximate Bayesian computation algorithm, as in Sisson et al.'s (2007) partial rejection control version. While this method is based upon the theoretical works of Del Moral et al. (2006), the application to approximate Bayesian computation results in a bias in the approximation to the posterior. An alternative version based on genuine importance sampling arguments bypasses this difficulty, in connection with the population Monte Carlo method of Cappe et al. (2004), and it includes an automatic scaling of the forward kernel. When applied to a population genetics example, it compares favourably with two other versions of the approximate algorithm.

541 citations


Journal ArticleDOI
Chris Hans1
TL;DR: New aspects of the broader Bayesian treatment of lasso regression are introduced, and it is shown that the standard lasso prediction method does not necessarily agree with model-based, Bayesian predictions.
Abstract: Summary The lasso estimate for linear regression corresponds to a posterior mode when independent, double-exponential prior distributions are placed on the regression coefficients. This paper intro duces new aspects of the broader Bayesian treatment of lasso regression. A direct characterization of the regression coefficients' posterior distribution is provided, and computation and inference under this characterization is shown to be straightforward. Emphasis is placed on point estima tion using the posterior mean, which facilitates prediction of future observations via the posterior predictive distribution. It is shown that the standard lasso prediction method does not neces sarily agree with model-based, Bayesian predictions. A new Gibbs sampler for Bayesian lasso regression is introduced.

457 citations


Journal ArticleDOI
TL;DR: This work proposes alternative doubly robust estimators that achieve comparable or improved performance relative to existing methods, even with some estimated propensity scores close to zero.
Abstract: Considerable recent interest has focused on doubly robust estimators for a population mean response in the presence of incomplete data, which involve models for both the propensity score and the regression of outcome on covariates. The usual doubly robust estimator may yield severely biased inferences if neither of these models is correctly specified and can exhibit nonnegligible bias if the estimated propensity score is close to zero for some observations. We propose alternative doubly robust estimators that achieve comparable or improved performance relative to existing methods, even with some estimated propensity scores close to zero.

329 citations


Journal ArticleDOI
TL;DR: The proposed group bridge approach is a penalized regularization method that uses a specially designed group bridge penalty that has the oracle group selection property, in that it can correctly select important groups with probability converging to one.
Abstract: In multiple regression problems when covariates can be naturally grouped, it is important to carry out feature selection at the group and within-group individual variable levels simultaneously. The existing methods, including the lasso and group lasso, are designed for either variable selection or group selection, but not for both. We propose a group bridge approach that is capable of simultaneous selection at both the group and within-group individual variable levels. The proposed approach is a penalized regularization method that uses a specially designed group bridge penalty. It has the oracle group selection property, in that it can correctly select important groups with probability converging to one. In contrast, the group lasso and group least angle regression methods in general do not possess such an oracle property in group selection. Simulation studies indicate that the group bridge has superior performance in group and individual variable selection relative to several existing methods.

329 citations


Journal ArticleDOI
TL;DR: In this article, the authors study the class of penalized spline estimators, which enjoy similarities to both regression splines, without penalty and with fewer knots than data points, and smoothing splines with knots equal to the data points and a penalty controlling the roughness of the fit.
Abstract: We study the class of penalized spline estimators, which enjoy similarities to both regression splines, without penalty and with fewer knots than data points, and smoothing splines, with knots equal to the data points and a penalty controlling the roughness of the fit. Depending on the number of knots, sample size and penalty, we show that the theoretical properties of penalized regression spline estimators are either similar to those of regression splines or to those of smoothing splines, with a clear breakpoint distinguishing the cases. We prove that using fewer knots results in better asymptotic rates than when using a large number of knots. We obtain expressions for bias and variance and asymptotic rates for the number of knots and penalty parameter.

224 citations


Journal ArticleDOI
TL;DR: The sinh-arcsinh transformation as discussed by the authors was introduced to a generating distribution with no parameters other than location and scale, usually the normal, by applying it to a new family of sinh -normal distributions, which allows for tailweights that are both heavier and lighter than those of the generating distribution.
Abstract: We introduce the sinh-arcsinh transformation and hence, by applying it to a generating distribution with no parameters other than location and scale, usually the normal, a new family of sinh-arcsinh distributions. This four-parameter family has symmetric and skewed members and allows for tailweights that are both heavier and lighter than those of the generating distribution. The central place of the normal distribution in this family affords likelihood ratio tests of normality that are superior to the state-of-the-art in normality testing because of the range of alternatives against which they are very powerful. Likelihood ratio tests of symmetry are also available and are very successful. Three-parameter symmetric and asymmetric subfamilies of the full family are also of interest. Heavy-tailed symmetric sinh-arcsinh distributions behave like Johnson SU distributions, while their light-tailed counterparts behave like sinh-normal distributions, the sinh-arcsinh family allowing a seamless transition between the two, via the normal, controlled by a single parameter. The sinh-arcsinh family is very tractable and many properties are explored. Likelihood inference is pursued, including an attractive reparameterization. Illustrative examples are given. A multivariate version is considered. Options and extensions are discussed.

218 citations


Journal ArticleDOI
TL;DR: In this article, a more general family of bias-reducing adjustments is developed for a broad class of univariate and multivariate generalized nonlinear models, and a necessary and sufficient condition is given for the existence of a penalized likelihood interpretation of the method.
Abstract: In Firth (1993, Biometrika) it was shown how the leading term in the asymptotic bias of the maximum likelihood estimator is removed by adjusting the score vector, and that in canonical-link generalized linear models the method is equivalent to maximizing a penalized likelihood that is easily implemented via iterative adjustment of the data. Here a more general family of bias-reducing adjustments is developed for a broad class of univariate and multivariate generalized nonlinear models. The resulting formulae for the adjusted score vector are computationally convenient, and in univariate models they directly suggest implementation through an iterative scheme of data adjustment. For generalized linear models a necessary and sufficient condition is given for the existence of a penalized likelihood interpretation of the method. An illustrative application to the Goodman row-column association model shows how the computational simplicity and statistical benefits of bias reduction extend beyond generalized linear models.

196 citations


Journal ArticleDOI
TL;DR: A novel iterative system is developed to build a statistical model of dynamic computer codes, which is demonstrated on a rainfall-runoff simulator.
Abstract: Computer codes are used in scientific research to study and predict the behaviour of complex systems. Their run times often make uncertainty and sensitivity analyses impractical because of the thousands of runs that are conventionally required, so efficient techniques have been developed based on a statistical representation of the code. The approach is less straightforward for dynamic codes, which represent time-evolving systems. We develop a novel iterative system to build a statistical model of dynamic computer codes, which is demonstrated on a rainfall-runoff simulator.

193 citations


Journal ArticleDOI
TL;DR: This work model pairwise dependence of temporal maxima, such as annual maxima of precipitation, that have been recorded in space, either on a regular grid or at irregularly spaced locations to propose a simple connection between extreme value theory and geostatistics.
Abstract: We model pairwise dependence of temporal maxima, such as annual maxima of precipitation, that have been recorded in space, either on a regular grid or at irregularly spaced locations The construction of our estimators stems from the variogram concept The asymptotic properties of our pairwise dependence estimators are established through properties of empirical processes The performance of our approach is illustrated by simulations and by the treatment of a real dataset In addition to bringing new results about the asymptotic behaviour of copula estimators, the latter being linked to first-order variograms, one main advantage of our approach is to propose a simple connection between extreme value theory and geostatistics

147 citations


Journal ArticleDOI
TL;DR: In this article, a default version of the hyper-inverse Wishart prior for restricted covariance matrices, called the hyperinverse wishart g-prior, was developed, which corresponds to the implied fractional prior for selecting a graph using fractional Bayes factors.
Abstract: SUMMARY This paper presents a default model-selection procedure for Gaussian graphical models that involves two new developments. First, we develop a default version of the hyper-inverse Wishart prior for restricted covariance matrices, called the hyper-inverse Wishart g-prior, and show how it corresponds to the implied fractional prior for selecting a graph using fractional Bayes factors. Second, we apply a class of priors that automatically handles the problem of multiple hypothesis testing. We demonstrate our methods on a variety of simulated examples, concluding with a real example analyzing covariation in mutual-fund returns. These studies reveal that the combined use of a multiplicity-correction prior on graphs and fractional Bayes factors for computing marginal likelihoods yields better performance than existing Bayesian methods.

Journal ArticleDOI
TL;DR: This work proposes an approach to constructing nested Latin hypercube designs that can accommodate any number of factors and extends this method to construct nested Latinhypercube designs with more than two layers.
Abstract: We propose an approach to constructing nested Latin hypercube designs. Such designs are useful for conducting multiple computer experiments with different levels of accuracy. A nested Latin hypercube design with two layers is defined to be a special Latin hypercube design that contains a smaller Latin hypercube design as a subset. Our method is easy to implement and can accommodate any number of factors. We also extend this method to construct nested Latin hypercube designs with more than two layers. Illustrative examples are given. Some statistical properties of the constructed designs are derived.

Journal ArticleDOI
TL;DR: The results for Poisson log-linear regression models of Davis et al. (2000), negative binomial logit regression models and other similarly specified generalized linear models are unify in a common framework.
Abstract: We study generalized linear models for time series of counts, where serial dependence is introduced through a dependent latent process in the link function. Conditional on the covariates and the latent process, the observation is modelled by a negative binomial distribution. To estimate the regression coefficients, we maximize the pseudolikelihood that is based on a generalized linear model with the latent process suppressed. We show the consistency and asymptotic normality of the generalized linear model estimator when the latent process is a stationary strongly mixing process. We extend the asymptotic results to generalized linear models for time series, where the observation variable, conditional on covariates and a latent process, is assumed to have a distribution from a one-parameter exponential family. Thus, we unify in a common framework the results for Poisson log-linear regression models of Davis et al. (2000), negative binomial logit regression models and other similarly specified generalized linear models. Language: en

Journal ArticleDOI
TL;DR: This paper proposes the covariate-adjusted receiver operating characteristic curve, a measure of covariates associated with the marker of interest, and characterize the age-adjusted discriminatory accuracy of prostate-specific antigen as a biomarker for prostate cancer.
Abstract: Recent scientific and technological innovations have produced an abundance of potential markers that are being investigated for their use in disease screening and diagnosis. In evaluating these markers, it is often necessary to account for covariates associated with the marker of interest. Covariates may include subject characteristics, expertise of the test operator, test procedures or aspects of specimen handling. In this paper, we propose the covariate-adjusted receiver operating characteristic curve, a measure of covariate-adjusted classification accuracy. Nonparametric and semiparametric estimators are proposed, asymptotic distribution theory is provided and finite sample performance is investigated. For illustration we characterize the age-adjusted discriminatory accuracy of prostate-specific antigen as a biomarker for prostate cancer.

Journal ArticleDOI
TL;DR: In this paper, the authors introduce a method for constructing a rich class of designs that are suitable for use in computer experiments, including Latin hypercube designs and two-level fractional factorial designs.
Abstract: We introduce a method for constructing a rich class of designs that are suitable for use in computer experiments. The designs include Latin hypercube designs and two-level fractional factorial designs as special cases and fill the vast vacuum between these two familiar classes of designs. The basic construction method is simple, building a series of larger designs based on a given small design. If the base design is orthogonal, the resulting designs are orthogonal; likewise, if the base design is nearly orthogonal, the resulting designs are nearly orthogonal. We present two generalizations of our basic construction method. The first generalization improves the projection properties of the basic method; the second generalization gives rise to designs that have smaller correlations. Sample constructions are presented and properties of these designs are discussed.

Journal ArticleDOI
TL;DR: In this article, the authors evaluate the effects of data dimension on the asymptotic normality of the empirical likelihood ratio for high-dimensional data under a general multivariate model.
Abstract: SUMMARY We evaluate the effects of data dimension on the asymptotic normality of the empirical likelihood ratio for high-dimensional data under a general multivariate model. Data dimension and dependence among components of the multivariate random vector affect the empirical likelihood directly through the trace and the eigenvalues of the covariance matrix. The growth rates to infinity we obtain for the data dimension improve the rates of Hjort et al. (2008).

Journal ArticleDOI
Hao Wang1, Mike West1
TL;DR: This work presents Bayesian analyses of matrix-variate normal data with conditional independencies induced by graphical model structuring of the characterizing covariance matrix parameters of matrix normal graphical models.
Abstract: We present Bayesian analyses of matrix-variate normal data with conditional independencies induced by graphical model structuring of the characterizing covariance matrix parameters. This framework of matrix normal graphical models includes prior specifications, posterior computation using Markov chain Monte Carlo methods, evaluation of graphical model uncertainty and model structure search. Extensions to matrix-variate time series embed matrix normal graphs in dynamic models. Examples highlight questions of graphical model uncertainty, search and comparison in matrix data contexts. These models may be applied in a number of areas of multivariate analysis, time series and also spatial modelling.

Journal ArticleDOI
TL;DR: It is shown that the large Latinhypercube inherits the exact or near orthogonality of the small Latin hypercube, so effort for searching for large Latin hypercubes can be focussed on finding small Latinhypercubes with the same property.
Abstract: We propose a method for constructing orthogonal or nearly orthogonal Latin hypercubes. The method yields a large Latin hypercube by coupling an orthogonal array of index unity with a small Latin hypercube. It is shown that the large Latin hypercube inherits the exact or near orthogonality of the small Latin hypercube. Thus, effort for searching for large Latin hypercubes, that are exactly or nearly orthogonal, can be focussed on finding small Latin hypercubes with the same property. We obtain a useful collection of orthogonal or nearly orthogonal Latin hypercubes, which have a large factor-to-run ratio and the results are often much more economical than existing methods.

Journal ArticleDOI
TL;DR: In this paper, the authors propose a method for constructing orthogonal Latin hypercube designs in which all the linear terms are orthogonality not only to each other, but also to the quadratic terms.
Abstract: SUMMARY Latin hypercube designs have found wide application. Such designs guarantee uniform samples for the marginal distribution of each input variable. We propose a method for constructing orthogonal Latin hypercube designs in which all the linear terms are orthogonal not only to each other, but also to the quadratic terms. This construction method is convenient and flexible, and the resulting designs can accommodate many more factors than can existing ones.

Journal ArticleDOI
TL;DR: In this paper, a new method was developed to address the group variable selection problem in the Cox proportional hazards model, which not only effectively removes unimportant groups, but also maintains the flexibility of selecting variables within the identified groups.
Abstract: SUMMARY In many biological and other scientific applications, predictors are often naturally grouped. For example, in biological applications, assayed genes or proteins are grouped by biological roles or biological pathways. When studying the dependence of survival outcome on these grouped predictors, it is desirable to select variables at both the group level and the within-group level. In this article, we develop a new method to address the group variable selection problem in the Cox proportional hazards model. Our method not only effectively removes unimportant groups, but also maintains the flexibility of selecting variables within the identified groups. We also show that the new method offers the potential for achieving the asymptotic oracle property.

Journal ArticleDOI
TL;DR: This paper extended the Dantzig selector to fit generalized linear models while eliminating overshrinkage of the coefficient estimates, and developed a computationally efficient algorithm, similar in nature to least angle regression, to compute the entire path of coefficient estimates.
Abstract: SUMMARY The Dantzig selector performs variable selection and model fitting in linear regression. It uses an L 1 penalty to shrink the regression coefficients towards zero, in a similar fashion to the lasso. While both the lasso and Dantzig selector potentially do a good job of selecting the correct variables, they tend to overshrink the final coefficients. This results in an unfortunate trade-off. One can either select a high shrinkage tuning parameter that produces an accurate model but poor coefficient estimates or a low shrinkage parameter that produces more accurate coefficients but includes many irrelevant variables. We extend the Dantzig selector to fit generalized linear models while eliminating overshrinkage of the coefficient estimates, and develop a computationally efficient algorithm, similar in nature to least angle regression, to compute the entire path of coefficient estimates. A simulation study illustrates the advantages of our approach relative to others. We apply the methodology to two datasets.

Journal ArticleDOI
TL;DR: In this article, the EM-test statistic is proposed for testing homogeneity of finite mixture models, which has a simple limiting distribution for examples in this paper, including mixtures of two geometric distributions and two exponential distributions.
Abstract: SUMMARY Even simple examples of finite mixture models can fail to fulfil the regularity conditions that are routinely assumed in standard parametric inference problems. Many methods have been investigated for testing for homogeneity in finite mixture models, for example, but all rely on regularity conditions including the finiteness of the Fisher information and the space of the mixing parameter being a compact subset of some Euclidean space. Very simple examples where such assumptions fail include mixtures of two geometric distributions and two exponential distributions, and, more generally, mixture models in scale distribution families. To overcome these difficulties, we propose and study an EM-test statistic, which has a simple limiting distribution for examples in this paper. Simulations show that the EM-test has accurate type I errors and is more efficient than existing methods when they are applicable. A real example is also included.

Journal ArticleDOI
TL;DR: A hierarchical model that allows for flexible estimation and clustering, while borrowing information across curves is proposed, and it is shown that the function estimates obtained are consistent on the space of integrable functions.
Abstract: In many modern experimental settings, observations are obtained in the form of functions, and interest focuses on inferences on a collection of such functions. We propose a hierarchical model that allows us to simultaneously estimate multiple curves nonparametrically by using dependent Dirichlet Process mixtures of Gaussians to characterize the joint distribution of predictors and outcomes. Function estimates are then induced through the conditional distribution of the outcome given the predictors. The resulting approach allows for flexible estimation and clustering, while borrowing information across curves. We also show that the function estimates we obtain are consistent on the space of integrable functions. As an illustration, we consider an application to the analysis of Conductivity and Temperature at Depth data in the north Atlantic.

Journal ArticleDOI
TL;DR: In this article, a sliced space-filling design for computer experiments with qualitative and quantitative factors is proposed, which starts with constructing a Latin hypercube design based on a special orthogonal array for the quantitative factors and then partitions the design into groups corresponding to different level combinations of the qualitative factors.
Abstract: We propose an approach to constructing a new type of design, a sliced space-filling design, intended for computer experiments with qualitative and quantitative factors. The approach starts with constructing a Latin hypercube design based on a special orthogonal array for the quantitative factors and then partitions the design into groups corresponding to different level combinations of the qualitative factors. The points in each group have good space-filling properties. Some illustrative examples are given.

Journal ArticleDOI
TL;DR: A pseudo-partial likelihood for proportional hazards models with biased-sampling data is obtained by embedding the biased-Sampling data into left-truncated data and asymptotic properties of the estimator that maximize the pseudo- partial likelihood are derived.
Abstract: We obtain a pseudo-partial likelihood for proportional hazards models with biased-sampling data by embedding the biased-sampling data into left-truncated data. The log pseudo-partial likelihood of the biased-sampling data is the expectation of the log partial likelihood of the left-truncated data conditioned on the observed data. In addition, asymptotic properties of the estimator that maximize the pseudo-partial likelihood are derived. Applications to length-biased data, biased samples with right censoring and proportional hazards models with missing covariates are discussed.

Journal ArticleDOI
TL;DR: In this paper, the authors investigated the use of composite likelihoods instead of the full likelihood for directional data based on a bivariate von Mises distribution and showed that the parameter estimate obtained by maximizing a composite likelihood can be viewed as an approximation to the full maximum likelihood estimate.
Abstract: In certain multivariate problems the full probability density has an awkward normalizing constant, but the conditional and/or marginal distributions may be much more tractable. In this paper we investigate the use of composite likelihoods instead of the full likelihood. For closed exponential families, both are shown to be maximized by the same parameter values for any number of observations. Examples include log-linear models and multivariate normal models. In other cases the parameter estimate obtained by maximizing a composite likelihood can be viewed as an approximation to the full maximum likelihood estimate. An application is given to an example in directional data based on a bivariate von Mises distribution.

Journal ArticleDOI
TL;DR: This work considers marginal proportional hazards regression models for case-cohort studies with multiple disease outcomes and proposes an estimating equation approach for parameter estimation with two different types of weights.
Abstract: Case-cohort study designs are widely used to reduce the cost of large cohort studies while achieving the same goals, especially when the disease rate is low. A key advantage of the case-cohort study design is its capacity to use the same subcohort for several diseases or for several subtypes of disease. In order to compare the effect of a risk factor on different types of diseases, times to different events need to be modelled simultaneously. Valid statistical methods that take the correlations among the outcomes from the same subject into account need to be developed. To this end, we consider marginal proportional hazards regression models for case-cohort studies with multiple disease outcomes. We also consider generalized case-cohort designs that do not require sampling all the cases, which is more realistic for multiple disease outcomes. We propose an estimating equation approach for parameter estimation with two different types of weights. Consistency and asymptotic normality of the proposed estimators are established. Large sample approximation works well in small samples in simulation studies. The proposed methods are applied to the Busselton Health Study.

Journal ArticleDOI
TL;DR: In this paper, the authors apply Fisher's fiducial idea to conduct statistical inference for wavelet regression, and propose fiduical based methods for performing wavelet curve estimation, as well as constructing both pointwise and curvewise confidence intervals.
Abstract: We apply Fisher’s fiducial idea to conduct statistical inference for wavelet regression. We first develop a general methodology for handling model selection problems within the fiducial framework. With this new methodology we then propose fiduical based methods for performing wavelet curve estimation, as well as constructing both pointwise and curvewise confidence intervals. It is shown that, under some mild regularity conditions, both the new fiducial based pointwise and curvewise confidence intervals have asymptotically correct coverage. Furthermore, simulation results show that these new fiducial based methods, especially for constructing pointwise confidence intervals, also possess promising empirical properties. To the best of our knowledge, this is the first time that the fiducial idea has been applied to a nonparametric estimation problem.

Journal ArticleDOI
TL;DR: There are two different ways to formalize the notion that only part of the missingness is ignorable, which are explained and applied in a latent-class analysis of survey questions with item nonresponse.
Abstract: SUMMARY When an assumption of missing at random is untenable, it becomes necessary to model missingdata indicators, which carry information about the parameters of the complete-data population. Within a given application, however, researchers may believe that some aspects of missingness are ignorable but others are not. We argue that there are two different ways to formalize the notion that only part of the missingness is ignorable. These approaches correspond to assumptions that we call partially missing at random and latently missing at random. We explain these concepts and apply them in a latent-class analysis of survey questions with item nonresponse.

Journal ArticleDOI
TL;DR: This paper extends the induced smoothing procedure for the semiparametric accelerated failure time model to the case of clustered failure time data and proves that the asymptotic distribution of the smoothed estimator coincides with that obtained without the use of smoothing.
Abstract: SUMMARY This paper extends the induced smoothing procedure of Brown & Wang (2006) for the semiparametric accelerated failure time model to the case of clustered failure time data. The resulting procedure permits fast and accurate computation of regression parameter estimates and standard errors using simple and widely available numerical methods, such as the Newton–Raphson algorithm. The regression parameter estimates are shown to be strongly consistent and asymptotically normal; in addition, we prove that the asymptotic distribution of the smoothed estimator coincides with that obtained without the use of smoothing. This establishes a key claim of Brown & Wang (2006) for the case of independent failure time data and also extends such results to the case of clustered data. Simulation results show that these smoothed estimates perform as well as those obtained using the best available methods at a fraction of the computational cost.