scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the American Statistical Association in 2012"


Journal ArticleDOI
TL;DR: This work considers the problem of detecting multiple changepoints in large data sets and introduces a new method for finding the minimum of such cost functions and hence the optimal number and location of changepoints that has a computational cost which is linear in the number of observations.
Abstract: In this article, we consider the problem of detecting multiple changepoints in large datasets. Our focus is on applications where the number of changepoints will increase as we collect more data: for example, in genetics as we analyze larger regions of the genome, or in finance as we observe time series over longer periods. We consider the common approach of detecting changepoints through minimizing a cost function over possible numbers and locations of changepoints. This includes several established procedures for detecting changing points, such as penalized likelihood and minimum description length. We introduce a new method for finding the minimum of such cost functions and hence the optimal number and location of changepoints that has a computational cost, which, under mild conditions, is linear in the number of observations. This compares favorably with existing methods for the same problem whose computational cost can be quadratic or even cubic. In simulation studies, we show that our new method can...

1,647 citations


Journal ArticleDOI
TL;DR: In this article, the authors propose an estimation approach that takes advantage of the rich sources in tick-by-tick data while preserving the continuous-time assumption on the underlying returns.
Abstract: It is a common practice in finance to estimate volatility from the sum of frequently sampled squared returns. However, market microstructure poses challenges to this estimation approach, as evidenced by recent empirical studies in finance. The present work attempts to lay out theoretical grounds that reconcile continuous-time modeling and discrete-time samples. We propose an estimation approach that takes advantage of the rich sources in tick-by-tick data while preserving the continuous-time assumption on the underlying returns. Under our framework, it becomes clear why and where the “usual” volatility estimator fails when the returns are sampled at the highest frequencies. If the noise is asymptotically small, our work provides a way of finding the optimal sampling frequency. A better approach, the “two-scales estimator,” works for any size of the noise.

726 citations


Journal ArticleDOI
TL;DR: This article shows that estimating an optimal ITR that is a deterministic function of patient-specific characteristics maximizing expected clinical outcome is equivalent to a classification problem where each subject is weighted proportional to his or her clinical outcome and proposes an outcome weighted learning approach based on the support vector machine framework.
Abstract: There is increasing interest in discovering individualized treatment rules (ITRs) for patients who have heterogeneous responses to treatment. In particular, one aims to find an optimal ITR that is a deterministic function of patient-specific characteristics maximizing expected clinical outcome. In this article, we first show that estimating such an optimal treatment rule is equivalent to a classification problem where each subject is weighted proportional to his or her clinical outcome. We then propose an outcome weighted learning approach based on the support vector machine framework. We show that the resulting estimator of the treatment rule is consistent. We further obtain a finite sample bound for the difference between the expected outcome using the estimated ITR and that of the optimal treatment rule. The performance of the proposed approach is demonstrated via simulation studies and an analysis of chronic depression data.

697 citations


Journal ArticleDOI
TL;DR: In this article, a sure independence screening procedure based on distance correlation (DC-SIS) was proposed for ultra-high-dimensional data analysis, which can be used directly to screen grouped predictor variables and multivariate response variables.
Abstract: This article is concerned with screening features in ultrahigh-dimensional data analysis, which has become increasingly important in diverse scientific fields. We develop a sure independence screening procedure based on the distance correlation (DC-SIS). The DC-SIS can be implemented as easily as the sure independence screening (SIS) procedure based on the Pearson correlation proposed by Fan and Lv. However, the DC-SIS can significantly improve the SIS. Fan and Lv established the sure screening property for the SIS based on linear models, but the sure screening property is valid for the DC-SIS under more general settings, including linear models. Furthermore, the implementation of the DC-SIS does not require model specification (e.g., linear model or generalized linear model) for responses or predictors. This is a very appealing property in ultrahigh-dimensional data analysis. Moreover, the DC-SIS can be used directly to screen grouped predictor variables and multivariate response variables. We establish ...

641 citations


Journal ArticleDOI
TL;DR: It is shown that with gross-exposure constraints, the empirically selected optimal portfolios based on estimated covariance matrices have similar performance to the theoretical optimal ones and there is no error accumulation effect from estimation of vast covarianceMatrices.
Abstract: This article introduces the large portfolio selection using gross-exposure constraints. It shows that with gross-exposure constraints, the empirically selected optimal portfolios based on estimated covariance matrices have similar performance to the theoretical optimal ones and there is no error accumulation effect from estimation of vast covariance matrices. This gives theoretical justification to the empirical results by Jagannathan and Ma. It also shows that the no-short-sale portfolio can be improved by allowing some short positions. The applications to portfolio selection, tracking, and improvements are also addressed. The utility of our new approach is illustrated by simulation and empirical studies on the 100 Fama–French industrial portfolios and the 600 stocks randomly selected from Russell 3000.

299 citations


Journal ArticleDOI
TL;DR: It is proved that this method to estimate block membership of nodes in a random graph generated by a stochastic blockmodel is consistent for assigning nodes to blocks, as only a negligible number of nodes will be misassigned.
Abstract: We present a method to estimate block membership of nodes in a random graph generated by a stochastic blockmodel. We use an embedding procedure motivated by the random dot product graph model, a particular example of the latent position model. The embedding associates each node with a vector; these vectors are clustered via minimization of a square error criterion. We prove that this method is consistent for assigning nodes to blocks, as only a negligible number of nodes will be misassigned. We prove consistency of the method for directed and undirected graphs. The consistent block assignment makes possible consistent parameter estimation for a stochastic blockmodel. We extend the result in the setting where the number of blocks grows slowly with the number of nodes. Our method is also computationally feasible even for very large graphs. We compare our method with Laplacian spectral clustering through analysis of simulated data and a graph derived from Wikipedia documents.

296 citations


Journal ArticleDOI
TL;DR: Theoretically, it is shown that constrained L 0 likelihood and its computational surrogate are optimal in that they achieve feature selection consistency andsharp parameter estimation, under one necessary condition required for any method to be selection consistent and to achieve sharp parameter estimation.
Abstract: In high-dimensional data analysis, feature selection becomes one effective means for dimension reduction, which proceeds with parameter estimation. Concerning accuracy of selection and estimation, we study nonconvex constrained and regularized likelihoods in the presence of nuisance parameters. Theoretically, we show that constrained L 0 likelihood and its computational surrogate are optimal in that they achieve feature selection consistency and sharp parameter estimation, under one necessary condition required for any method to be selection consistent and to achieve sharp parameter estimation. It permits up to exponentially many candidate features. Computationally, we develop difference convex methods to implement the computational surrogate through prime and dual subproblems. These results establish a central role of L 0 constrained and regularized likelihoods in feature selection and parameter estimation involving selection. As applications of the general method and theory, we perform feature selection...

282 citations


Journal ArticleDOI
TL;DR: A novel, sufficient optimality condition that relies on a convex differencing representation of the penalized loss function and the subdifferential calculus is introduced that enables the oracle property for sparse quantile regression in the ultra-high dimension under relaxed conditions.
Abstract: Ultra-high dimensional data often display heterogeneity due to either heteroscedastic variance or other forms of non-location-scale covariate effects. To accommodate heterogeneity, we advocate a more general interpretation of sparsity, which assumes that only a small number of covariates influence the conditional distribution of the response variable, given all candidate covariates; however, the sets of relevant covariates may differ when we consider different segments of the conditional distribution. In this framework, we investigate the methodology and theory of nonconvex, penalized quantile regression in ultra-high dimension. The proposed approach has two distinctive features: (1) It enables us to explore the entire conditional distribution of the response variable, given the ultra-high-dimensional covariates, and provides a more realistic picture of the sparsity pattern; (2) it requires substantially weaker conditions compared with alternative methods in the literature; thus, it greatly alleviates the...

266 citations


Journal ArticleDOI
TL;DR: Modifications of Bayesian model selection methods by imposing nonlocal prior densities on model parameters are proposed and it is demonstrated that these model selection procedures perform as well or better than commonly used penalized likelihood methods in a range of simulation settings.
Abstract: Standard assumptions incorporated into Bayesian model selection procedures result in procedures that are not competitive with commonly used penalized likelihood methods. We propose modifications of these methods by imposing nonlocal prior densities on model parameters. We show that the resulting model selection procedures are consistent in linear model settings when the number of possible covariates p is bounded by the number of observations n, a property that has not been extended to other model selection procedures. In addition to consistently identifying the true model, the proposed procedures provide accurate estimates of the posterior probability that each identified model is correct. Through simulation studies, we demonstrate that these model selection procedures perform as well or better than commonly used penalized likelihood methods in a range of simulation settings. Proofs of the primary theorems are provided in the Supplementary Material that is available online.

251 citations


Journal ArticleDOI
TL;DR: The authors derived informative bounds on the average treatment effect (ATE) of SNAP on child food insecurity, poor general health, obesity, and anemia across a range of different assumptions used to address the selection and classification error problems.
Abstract: The literature assessing the efficacy of the Supplemental Nutrition Assistance Program (SNAP), formerly known as the Food Stamp Program, has long puzzled over positive associations between SNAP receipt and various undesirable health outcomes such as food insecurity. Assessing the causal impacts of SNAP, however, is hampered by two key identification problems: endogenous selection into participation and extensive systematic underreporting of participation status. Using data from the National Health and Nutrition Examination Survey (NHANES), we extend partial identification bounding methods to account for these two identification problems in a single unifying framework. Specifically, we derive informative bounds on the average treatment effect (ATE) of SNAP on child food insecurity, poor general health, obesity, and anemia across a range of different assumptions used to address the selection and classification error problems. In particular, to address the selection problem, we apply relatively weak nonparam...

244 citations


Journal ArticleDOI
TL;DR: A group-lasso type penalty is applied that treats each row of the matrix of the regression coefficients as a group and shows that this penalty satisfies certain desirable invariance properties of the reduced-rank regression coefficient matrix.
Abstract: The reduced-rank regression is an effective method in predicting multiple response variables from the same set of predictor variables. It reduces the number of model parameters and takes advantage of interrelations between the response variables and hence improves predictive accuracy. We propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty. We apply a group-lasso type penalty that treats each row of the matrix of the regression coefficients as a group and show that this penalty satisfies certain desirable invariance properties. We develop two numerical algorithms to solve the penalized regression problem and establish the asymptotic consistency of the proposed method. In particular, the manifold structure of the reduced-rank regression coefficient matrix is considered and studied in our theoretical analysis. In our simulation study and real data analysis, the new method is compared with several existing variable selection methods for multivariate regression...

Journal ArticleDOI
TL;DR: In this paper, the authors present a new method for optimal matching in observational studies based on mixed integer programming, which achieves covariate balance directly by minimizing both the total sum of distances and a weighted sum of specific measures of covariate imbalance.
Abstract: This article presents a new method for optimal matching in observational studies based on mixed integer programming. Unlike widely used matching methods based on network algorithms, which attempt to achieve covariate balance by minimizing the total sum of distances between treated units and matched controls, this new method achieves covariate balance directly, either by minimizing both the total sum of distances and a weighted sum of specific measures of covariate imbalance, or by minimizing the total sum of distances while constraining the measures of imbalance to be less than or equal to certain tolerances. The inclusion of these extra terms in the objective function or the use of these additional constraints explicitly optimizes or constrains the criteria that will be used to evaluate the quality of the match. For example, the method minimizes or constrains differences in univariate moments, such as means, variances, and skewness; differences in multivariate moments, such as correlations between covari...

Journal ArticleDOI
TL;DR: In this article, a principal factor approximation (PFA) based method was proposed to solve the problem of false discovery control in large-scale multiple hypothesis testing, where a common threshold is used and a consistent estimate of realized FDP is provided.
Abstract: Multiple hypothesis testing is a fundamental problem in high-dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any single-nucleotide polymorphisms (SNPs) are associated with some traits and those tests are correlated. When test statistics are correlated, false discovery control becomes very challenging under arbitrary dependence. In this article, we propose a novel method—based on principal factor approximation—that successfully subtracts the common dependence and weakens significantly the correlation structure, to deal with an arbitrary dependence structure. We derive an approximate expression for false discovery proportion (FDP) in large-scale multiple testing when a common threshold is used and provide a consistent estimate of realized FDP. This result has important applications in controlling false discovery rate and FDP. Our estimate of realized FDP compares favorably with Efr...

Journal ArticleDOI
TL;DR: This work develops an approach to producing density forecasts for the wind power generated at individual wind farms using a VARMA-GARCH model and conditional kernel density estimation, which enables a nonparametric modeling of the conditional density of wind power.
Abstract: Of the various renewable energy resources, wind power is widely recognized as one of the most promising. The management of wind farms and electricity systems can benefit greatly from the availability of estimates of the probability distribution of wind power generation. However, most research has focused on point forecasting of wind power. In this article, we develop an approach to producing density forecasts for the wind power generated at individual wind farms. Our interest is in intraday data and prediction from 1 to 72 hours ahead. We model wind power in terms of wind speed and wind direction. In this framework, there are two key uncertainties. First, there is the inherent uncertainty in wind speed and direction, and we model this using a bivariate vector autoregressive moving average-generalized autoregressive conditional heteroscedastic (VARMA-GARCH) model, with a Student t error distribution, in the Cartesian space of wind speed and direction. Second, there is the stochastic nature of the relations...

Journal ArticleDOI
TL;DR: This study introduces a new class of models for multivariate discrete data based on pair copula constructions (PCCs) that has two major advantages; it is shown that discrete PCCs attain highly flexible dependence structures and the high quality of inference function for margins and maximum likelihood estimates is demonstrated.
Abstract: Multivariate discrete response data can be found in diverse fields, including econometrics, finance, biometrics, and psychometrics. Our contribution, through this study, is to introduce a new class of models for multivariate discrete data based on pair copula constructions (PCCs) that has two major advantages. First, by deriving the conditions under which any multivariate discrete distribution can be decomposed as a PCC, we show that discrete PCCs attain highly flexible dependence structures. Second, the computational burden of evaluating the likelihood for an m-dimensional discrete PCC only grows quadratically with m. This compares favorably to existing models for which computing the likelihood either requires the evaluation of 2 m terms or slow numerical integration methods. We demonstrate the high quality of inference function for margins and maximum likelihood estimates, both under a simulated setting and for an application to a longitudinal discrete dataset on headache severity. This article has onli...

Journal ArticleDOI
TL;DR: The semiparametric approach reveals that in the inverse regression context while keeping the estimation structure intact, the common assumption of linearity and/or constant variance on the covariates can be removed at the cost of performing additional nonparametric regression.
Abstract: We provide a novel and completely different approach to dimension-reduction problems from the existing literature. We cast the dimension-reduction problem in a semiparametric estimation framework and derive estimating equations. Viewing this problem from the new angle allows us to derive a rich class of estimators, and obtain the classical dimension reduction techniques as special cases in this class. The semiparametric approach also reveals that in the inverse regression context while keeping the estimation structure intact, the common assumption of linearity and/or constant variance on the covariates can be removed at the cost of performing additional nonparametric regression. The semiparametric estimators without these common assumptions are illustrated through simulation studies and a real data example. This article has online supplementary material.

Journal ArticleDOI
TL;DR: In this article, the authors estimate aerosol concentrations from remote sensing instruments so as to minimize errors “downstream” in climate models by combining information from multiple remote sensing data sources.
Abstract: Aerosols are tiny solid or liquid particles suspended in the atmosphere; examples of aerosols include windblown dust, sea salts, volcanic ash, smoke from wildfires, and pollution from factories. The global distribution of aerosols is a topic of great interest in climate studies since aerosols can either cool or warm the atmosphere depending on their location, type, and interaction with clouds. Aerosol concentrations are important input components of global climate models, and it is crucial to accurately estimate aerosol concentrations from remote sensing instruments so as to minimize errors “downstream” in climate models. Currently, space-based observations of aerosols are available from two remote sensing instruments on board NASA's Terra spacecraft: the Multiangle Imaging SpectroRadiometer (MISR), and the MODerate-resolution Imaging Spectrometer (MODIS). These two instruments have complementary coverage, spatial support, and retrieval characteristics, making it advantageous to combine information from b...

Journal ArticleDOI
TL;DR: Two methods for estimating space and space-time covariance functions from a Gaussian random field based on the composite likelihood idea are proposed, which are useful for practitioners looking for a good balance between computational complexity and statistical efficiency.
Abstract: In this article, we propose two methods for estimating space and space-time covariance functions from a Gaussian random field, based on the composite likelihood idea. The first method relies on the maximization of a weighted version of the composite likelihood function, while the second one is based on the solution of a weighted composite score equation. This last scheme is quite general and could be applied to any kind of composite likelihood. An information criterion for model selection based on the first estimation method is also introduced. The methods are useful for practitioners looking for a good balance between computational complexity and statistical efficiency. The effectiveness of the methods is illustrated through examples, simulation experiments, and by analyzing a dataset on ozone measurements.

Journal ArticleDOI
TL;DR: This article embeds a classical mathematical epidemiology model [a susceptible-exposed-infected-recovered (SEIR) model] within the state-space framework, thereby extending the SEIR dynamics to allow changes through time.
Abstract: In this article, we use Google Flu Trends data together with a sequential surveillance model based on state-space methodology to track the evolution of an epidemic process over time We embed a classical mathematical epidemiology model [a susceptible-exposed-infected-recovered (SEIR) model] within the state-space framework, thereby extending the SEIR dynamics to allow changes through time The implementation of this model is based on a particle filtering algorithm, which learns about the epidemic process sequentially through time and provides updated estimated odds of a pandemic with each new surveillance data point We show how our approach, in combination with sequential Bayes factors, can serve as an online diagnostic tool for influenza pandemic We take a close look at the Google Flu Trends data describing the spread of flu in the United States during 2003–2009 and in nine separate US states chosen to represent a wide range of health care and emergency system strengths and weaknesses This article h

Journal ArticleDOI
TL;DR: The proposed construction method for constructing a new type of space-filling design, called a sliced Latin hypercube design, intended for running computer experiments, is easy to implement, capable of accommodating any number of factors, and flexible in run size.
Abstract: This article proposes a method for constructing a new type of space-filling design, called a sliced Latin hypercube design, intended for running computer experiments. Such a design is a special Latin hypercube design that can be partitioned into slices of smaller Latin hypercube designs. It is desirable to use the constructed designs for collective evaluations of computer models and ensembles of multiple computer models. The proposed construction method is easy to implement, capable of accommodating any number of factors, and flexible in run size. Examples are given to illustrate the method. Sampling properties of the constructed designs are examined. Numerical illustration is provided to corroborate the derived theoretical results.

Journal ArticleDOI
TL;DR: A positive-definite ℓ1-penalized covariance estimator for estimating sparse large covariance matrices is developed and an efficient alternating direction method is derived to solve the challenging optimization problem and establish its convergence properties.
Abstract: The thresholding covariance estimator has nice asymptotic properties for estimating sparse large covariance matrices, but it often has negative eigenvalues when used in real data analysis. To fix this drawback of thresholding estimation, we develop a positive-definite l1-penalized covariance estimator for estimating sparse large covariance matrices. We derive an efficient alternating direction method to solve the challenging optimization problem and establish its convergence properties. Under weak regularity conditions, nonasymptotic statistical theory is also established for the proposed estimator. The competitive finite-sample performance of our proposal is demonstrated by both simulation and real applications.

Journal ArticleDOI
TL;DR: It is shown that DD-classifier is asymptotically equivalent to the Bayes rule under suitable conditions, and it can achieve Bayes error for a family broader than elliptical distributions.
Abstract: Using the DD-plot (depth vs. depth plot), we introduce a new nonparametric classification algorithm and call it DD-classifier. The algorithm is completely nonparametric, and it requires no prior knowledge of the underlying distributions or the form of the separating curve. Thus, it can be applied to a wide range of classification problems. The algorithm is completely data driven and its classification outcome can be easily visualized in a two-dimensional plot regardless of the dimension of the data. Moreover, it has the advantage of bypassing the estimation of underlying parameters such as means and scales, which is often required by the existing classification procedures. We study the asymptotic properties of the DD-classifier and its misclassification rate. Specifically, we show that DD-classifier is asymptotically equivalent to the Bayes rule under suitable conditions, and it can achieve Bayes error for a family broader than elliptical distributions. The performance of the classifier is also examined u...

Journal ArticleDOI
TL;DR: A novel multicategory generalization of ψ-learning that treats all classes simultaneously and can deliver accurate class prediction and is more robust against extreme observations than its SVM counterpart.
Abstract: In binary classification, margin-based techniques usually deliver high performance. As a result, a multicategory problem is often treated as a sequence of binary classifications. In the absence of a dominating class, this treatment may be suboptimal and may yield poor performance, such as for support vector machines (SVMs). We propose a novel multicategory generalization of ψ-learning that treats all classes simultaneously. The new generalization eliminates this potential problem while at the same time retaining the desirable properties of its binary counterpart. We develop a statistical learning theory for the proposed methodology and obtain fast convergence rates for both linear and nonlinear learning examples. We demonstrate the operational characteristics of this method through a simulation. Our results indicate that the proposed methodology can deliver accurate class prediction and is more robust against extreme observations than its SVM counterpart.

Journal ArticleDOI
TL;DR: In this article, the authors exploit ideas from high-dimensional data analysis to derive new portmanteau tests that are based on the trace of the square of the mth order autocorrelation matrix.
Abstract: We exploit ideas from high-dimensional data analysis to derive new portmanteau tests that are based on the trace of the square of the mth order autocorrelation matrix. The resulting statistics are weighted sums of the squares of the sample autocorrelation coefficients that, unlike many other tests appearing in the literature, are numerically stable even when the number of lags considered is relatively close to the sample size. The statistics behave asymptotically as a linear combination of chi-squared random variables and their asymptotic distribution can be approximated by a gamma distribution. The proposed tests are modified to check for nonlinearity and to check the adequacy of a fitted nonlinear model. Simulation evidence indicates that the proposed goodness of fit tests tend to have higher power than other tests appearing in the literature, particularly in detecting long-memory nonlinear models. The efficacy of the proposed methods is demonstrated by investigating nonlinear effects in Apple, Inc., an...

Journal ArticleDOI
TL;DR: Instrumental variables (IVs) can be used to construct estimators of exposure effects on the outcomes of studies affected by non-ignorable selection of the exposure.
Abstract: Instrumental variables (IVs) can be used to construct estimators of exposure effects on the outcomes of studies affected by nonignorable selection of the exposure. Estimators that fail to adjust for the effects of nonignorable selection will be biased and inconsistent. Such situations commonly arise in observational studies, but are also a problem for randomized experiments affected by nonignorable noncompliance. In this article, we review IV estimators for studies in which the outcome is binary, and consider the links between different approaches developed in the statistics and econometrics literatures. The implicit assumptions made by each method are highlighted and compared within our framework. We illustrate our findings through the reanalysis of a randomized placebo-controlled trial, and highlight important directions for future work in this area.

Journal ArticleDOI
TL;DR: This article considers minimax and adaptive prediction with functional predictors in the framework of functional linear model and reproducing kernel Hilbert space and proposes an easily implementable data-driven roughness regularization predictor that is shown to attain the optimal rate of convergence adaptively without the need of knowing the covariance kernel.
Abstract: This article considers minimax and adaptive prediction with functional predictors in the framework of functional linear model and reproducing kernel Hilbert space. Minimax rate of convergence for the excess prediction risk is established. It is shown that the optimal rate is determined jointly by the reproducing kernel and the covariance kernel. In particular, the alignment of these two kernels can significantly affect the difficulty of the prediction problem. In contrast, the existing literature has so far focused only on the setting where the two kernels are nearly perfectly aligned. This motivates us to propose an easily implementable data-driven roughness regularization predictor that is shown to attain the optimal rate of convergence adaptively without the need of knowing the covariance kernel. Simulation studies are carried out to illustrate the merits of the adaptive predictor and to demonstrate the theoretical results.

Journal ArticleDOI
TL;DR: A valid parametric family of cross-covariance functions for multivariate spatial random fields where each component has a covariance function from a well-celebrated Matérn class is introduced.
Abstract: We introduce a valid parametric family of cross-covariance functions for multivariate spatial random fields where each component has a covariance function from a well-celebrated Matern class. Unlike previous attempts, our model indeed allows for various smoothnesses and rates of correlation decay for any number of vector components. We present the conditions on the parameter space that result in valid models with varying degrees of complexity. We discuss practical implementations, including reparameterizations to reflect the conditions on the parameter space and an iterative algorithm to increase the computational efficiency. We perform various Monte Carlo simulation experiments to explore the performances of our approach in terms of estimation and cokriging. The application of the proposed multivariate Matern model is illustrated on two meteorological datasets: temperature/pressure over the Pacific Northwest (bivariate) and wind/temperature/pressure in Oklahoma (trivariate). In the latter case, our flexi...

Journal ArticleDOI
TL;DR: This work shows how this can be achieved by augmenting the likelihood with continuous latent variables, and computing inference using the resulting augmented posterior, and establishes the effectiveness of the estimation method by modeling consumer behavior in online retail using Archimedean and Gaussian copulas.
Abstract: Estimation of copula models with discrete margins can be difficult beyond the bivariate case. We show how this can be achieved by augmenting the likelihood with continuous latent variables, and computing inference using the resulting augmented posterior. To evaluate this, we propose two efficient Markov chain Monte Carlo sampling schemes. One generates the latent variables as a block using a Metropolis–Hastings step with a proposal that is close to its target distribution, the other generates them one at a time. Our method applies to all parametric copulas where the conditional copula functions can be evaluated, not just elliptical copulas as in much previous work. Moreover, the copula parameters can be estimated joint with any marginal parameters, and Bayesian selection ideas can be employed. We establish the effectiveness of the estimation method by modeling consumer behavior in online retail using Archimedean and Gaussian copulas. The example shows that elliptical copulas can be poor at modeling depend...

Journal ArticleDOI
TL;DR: This article proposes a class of shrinkage estimators based on Stein’s unbiased estimate of risk (SURE) and establishes the asymptotic optimality property for the SURE estimators, and applies the methods to two real datasets and obtain encouraging results.
Abstract: Hierarchical models are extensively studied and widely used in statistics and many other scientific areas. They provide an effective tool for combining information from similar resources and achieving partial pooling of inference. Since the seminal work by James and Stein (1961) and Stein (1962), shrinkage estimation has become one major focus for hierarchical models. For the homoscedastic normal model, it is well known that shrinkage estimators, especially the James-Stein estimator, have good risk properties. The heteroscedastic model, though more appropriate for practical applications, is less well studied, and it is unclear what types of shrinkage estimators are superior in terms of the risk. We propose in this article a class of shrinkage estimators based on Stein’s unbiased estimate of risk (SURE). We study asymptotic properties of various common estimators as the number of means to be estimated grows (p → ∞). We establish the asymptotic optimality property for the SURE estimators. We then extend our...

Journal ArticleDOI
TL;DR: It is found that in general, moment-based estimators of combined treatment effects and heterogeneity are biased and the degree of bias is proportional to the rarity of the event under study and the new methods eliminate much, but not all, of this bias.
Abstract: We examine the use of fixed-effects and random-effects moment-based meta-analytic methods for analysis of binary adverse-event data. Special attention is paid to the case of rare adverse events that are commonly encountered in routine practice. We study estimation of model parameters and between-study heterogeneity. In addition, we examine traditional approaches to hypothesis testing of the average treatment effect and detection of the heterogeneity of treatment effect across studies. We derive three new methods, a simple (unweighted) average treatment effect estimator, a new heterogeneity estimator, and a parametric bootstrapping test for heterogeneity. We then study the statistical properties of both the traditional and the new methods via simulation. We find that in general, moment-based estimators of combined treatment effects and heterogeneity are biased and the degree of bias is proportional to the rarity of the event under study. The new methods eliminate much, but not all, of this bias. The variou...