scispace - formally typeset
Search or ask a question

Showing papers in "Statistical Science in 2012"


Journal ArticleDOI
TL;DR: In this paper, a unified framework for establishing consistency and convergence rates for regularized M$-estimators under high-dimensional scaling was provided, which can be used to re-derive some existing results.
Abstract: High-dimensional statistical inference deals with models in which the the number of parameters $p$ is comparable to or larger than the sample size $n$. Since it is usually impossible to obtain consistent procedures unless $p/n\rightarrow0$, a line of recent work has studied models with various types of low-dimensional structure, including sparse vectors, sparse and structured matrices, low-rank matrices and combinations thereof. In such settings, a general approach to estimation is to solve a regularized optimization problem, which combines a loss function measuring how well the model fits the data with some regularization function that encourages the assumed structure. This paper provides a unified framework for establishing consistency and convergence rates for such regularized $M$-estimators under high-dimensional scaling. We state one main theorem and show how it can be used to re-derive some existing results, and also to obtain a number of new results on consistency and convergence rates, in both $\ell_{2}$-error and related norms. Our analysis also identifies two key properties of loss and regularization functions, referred to as restricted strong convexity and decomposability, that ensure corresponding regularized $M$-estimators have fast convergence rates and which are optimal in many well-studied cases.

911 citations


Journal ArticleDOI
TL;DR: In this paper, the main types of statistical models based on latent variables, on copulas and on spatial max-stable processes are described and compared by application to a data set on rainfall in Switzerland.
Abstract: The areal modeling of the extremes of a natural process such as rainfall or temperature is important in environmental statistics; for example, understanding extreme areal rainfall is crucial in flood protection. This article reviews recent progress in the statistical modeling of spatial extremes, starting with sketches of the necessary elements of extreme value statistics and geostatistics. The main types of statistical models thus far proposed, based on latent variables, on copulas and on spatial max-stable processes, are described and then are compared by application to a data set on rainfall in Switzerland. Whereas latent variable modeling allows a better fit to marginal distributions, it fits the joint distributions of extremes poorly, so appropriately-chosen copula or max-stable models seem essential for successful spatial modeling of extremes.

572 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider situations where they are not only interested in sparsity, but where some structural prior knowledge is available as well, and show that the $\ell_1$-norm can then be extended to structured norms built on either disjoint or overlapping groups of variables.
Abstract: Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. While naturally cast as a combinatorial optimization problem, variable or feature selection admits a convex relaxation through the regularization by the $\ell_1$-norm. In this paper, we consider situations where we are not only interested in sparsity, but where some structural prior knowledge is available as well. We show that the $\ell_1$-norm can then be extended to structured norms built on either disjoint or overlapping groups of variables, leading to a flexible framework that can deal with various structures. We present applications to unsupervised learning, for structured sparse principal component analysis and hierarchical dictionary learning, and to supervised learning in the context of non-linear variable selection.

335 citations


Journal ArticleDOI
TL;DR: A recent review of group selection methods for variable selection can be found in this article, where the authors give a selective review concerning methodological developments, theoretical properties and computational algorithms for group selection.
Abstract: Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study.

281 citations


Journal ArticleDOI
TL;DR: A general theoretical framework is presented showing that under appropriate conditions, the global solution of nonconvex regularization leads to desirable recovery performance and corresponds to the unique sparse local solution, which can be obtained via different numerical procedures.
Abstract: Concave regularization methods provide natural procedures for sparse recovery. However, they are difficult to analyze in the high-dimensional setting. Only recently a few sparse recovery results have been established for some specific local solutions obtained via specialized numerical procedures. Still, the fundamental relationship between these solutions such as whether they are identical or their relationship to the global minimizer of the underlying nonconvex formulation is unknown. The current paper fills this conceptual gap by presenting a general theoretical framework showing that, under appropriate conditions, the global solution of nonconvex regularization leads to desirable recovery performance; moreover, under suitable conditions, the global solution corresponds to the unique sparse local solution, which can be obtained via different numerical procedures. Under this unified framework, we present an overview of existing results and discuss their connections. The unified view of this work leads to a more satisfactory treatment of concave high-dimensional sparse estimation procedures, and serves as a guideline for developing further numerical procedures for concave regularization.

278 citations


Journal ArticleDOI
TL;DR: In this paper, the authors review and assess estimators of fractal dimension by their large sample behavior under infill asymptotics, in extensive finite sample simulation studies, and in a data example on arctic sea-ice profiles.
Abstract: The fractal or Hausdorff dimension is a measure of roughness (or smoothness) for time series and spatial data. The graph of a smooth, differentiable surface indexed in $\mathbb{R}^{d}$ has topological and fractal dimension $d$. If the surface is nondifferentiable and rough, the fractal dimension takes values between the topological dimension, $d$, and $d+1$. We review and assess estimators of fractal dimension by their large sample behavior under infill asymptotics, in extensive finite sample simulation studies, and in a data example on arctic sea-ice profiles. For time series or line transect data, box-count, Hall–Wood, semi-periodogram, discrete cosine transform and wavelet estimators are studied along with variation estimators with power indices 2 (variogram) and 1 (madogram), all implemented in the R package fractaldim. Considering both efficiency and robustness, we recommend the use of the madogram estimator, which can be interpreted as a statistically more efficient version of the Hall–Wood estimator. For two-dimensional lattice data, we propose robust transect estimators that use the median of variation estimates along rows and columns. Generally, the link between power variations of index $p>0$ for stochastic processes, and the Hausdorff dimension of their sample paths, appears to be particularly robust and inclusive when $p=1$.

195 citations


Journal ArticleDOI
TL;DR: In this article, the sparsity pattern aggregation principle is used to derive sparsity oracle in equalities in several popular frameworks including ordinary sparsity, fused sparsity and group sparsity.
Abstract: Consider a regression model with fixed design and Gaussian noise where the regression function can potentially be well approximated by a function that admits a sparse representation in a given dictionary. This paper resorts to exponential weights to exploit this underlying sparsity by implementing the principle of sparsity pattern aggregation. This model selection take on sparse estimation allows us to derive sparsity oracle in- equalities in several popular frameworks including ordinary sparsity, fused sparsity and group sparsity. One striking aspect of these theoretical re- sults is that they hold under no condition on the dictionary. Moreover, we describe an efficient implementation of the sparsity pattern aggregation principle that compares favorably to state-of-the-art procedures on some basic numerical examples.

104 citations


Journal ArticleDOI
TL;DR: In this paper, the authors provide an updated overview of the Thurstonian and Bradley-Terry models in the analysis of paired comparison data, including how to account for object-and subject-specic covariates and how to deal with ordinal paired comparison.
Abstract: Thurstonian and Bradley-Terry models are the most commonly applied models in the analysis of paired comparison data. Since their introduction, numerous developments of those models have been proposed in different areas. This paper provides an updated overview of these extensions, including how to account for object- and subject-specic covariates and how to deal with ordinal paired comparison data. Special emphasis is given to models for dependent comparisons. Although these models are more realistic, their use is complicated by numerical difficulties. We therefore concentrate on implementation issues. In particular, a pairwise likelihood approach is explored for models for dependent paired comparison data and a simulation study is carried out to compare the performance of maximum pairwise likelihood with other methods, such as limited information estimation. The methodology is illustrated throughout using a real data set about university paired comparisons performed by students.

104 citations


Journal ArticleDOI
TL;DR: In this paper, the authors provide an overview of what has been learned about collaborative filtering and recommender systems from the legacy of the Netflix contest, and provide a primarily statistical treatment of the lessons that have been learned from this remarkable set of data.
Abstract: Inspired by the legacy of the Netflix contest, we provide an overview of what has been learned—from our own efforts, and those of others—concerning the problems of collaborative filtering and recommender systems. The data set consists of about 100 million movie ratings (from 1 to 5 stars) involving some 480 thousand users and some 18 thousand movies; the associated ratings matrix is about 99% sparse. The goal is to predict ratings that users will give to movies; systems which can do this accurately have significant commercial applications, particularly on the world wide web. We discuss, in some detail, approaches to “baseline” modeling, singular value decomposition (SVD), as well as kNN (nearest neighbor) and neural network models; temporal effects, cross-validation issues, ensemble methods and other considerations are discussed as well. We compare existing models in a search for new models, and also discuss the mission-critical issues of penalization and parameter shrinkage which arise when the dimensions of a parameter space reaches into the millions. Although much work on such problems has been carried out by the computer science and machine learning communities, our goal here is to address a statistical audience, and to provide a primarily statistical treatment of the lessons that have been learned from this remarkable set of data.

62 citations


Journal ArticleDOI
TL;DR: Two approaches to building more flexible graphical models are discussed, one allows arbitrary graphs and a nonparametric extension of the Gaussian; the other uses kernel density estimation and restricts the graphs to trees and forests.
Abstract: We present some nonparametric methods for graphical modeling. In the discrete case, where the data are binary or drawn from a finite alphabet, Markov random fields are already essentially nonparametric, since the cliques can take only a finite number of values. Continuous data are different. The Gaussian graphical model is the standard parametric model for continuous data, but it makes distributional assumptions that are often unrealistic. We discuss two approaches to building more flexible graphical models. One allows arbitrary graphs and a nonparametric extension of the Gaussian; the other uses kernel density estimation and restricts the graphs to trees and forests. Examples of both methods are presented. We also discuss possible future research directions for nonparametric graphical modeling.

58 citations


Journal ArticleDOI
TL;DR: In this paper, a review of recent results for high-dimensional sparse linear regression in the practical case of unknown variance is presented, including coordinate sparsity, group sparsity and variation sparsity.
Abstract: We review recent results for high-dimensional sparse linear re- gression in the practical case of unknown variance. Different sparsity settings are covered, including coordinate-sparsity, group-sparsity and variation- sparsity. The emphasis is put on nonasymptotic analyses and feasible pro- cedures. In addition, a small numerical study compares the practical perfor- mance of three schemes for tuning the lasso estimator and some references are collected for some more general models, including multivariate regres- sion and nonparametric regression.

Journal ArticleDOI
TL;DR: A computational strategy by which existing software can be modified to solve the augmented regularization problem, providing conditions under which such modification will converge to the optimum solution is outlined.
Abstract: Regularization methods allow one to handle a variety of inferential problems where there are more covariates than cases. This allows one to consider a potentially enormous number of covariates for a problem. We exploit the power of these techniques, supersaturating models by augmenting the “natural” covariates in the problem with an additional indicator for each case in the data set. We attach a penalty term for these case-specific indicators which is designed to produce a desired effect. For regression methods with squared error loss, an $\ell_{1}$ penalty produces a regression which is robust to outliers and high leverage cases; for quantile regression methods, an $\ell_{2}$ penalty decreases the variance of the fit enough to overcome an increase in bias. The paradigm thus allows us to robustify procedures which lack robustness and to increase the efficiency of procedures which are robust. We provide a general framework for the inclusion of case-specific parameters in regularization problems, describing the impact on the effective loss for a variety of regression and classification problems. We outline a computational strategy by which existing software can be modified to solve the augmented regularization problem, providing conditions under which such modification will converge to the optimum solution. We illustrate the benefits of including case-specific parameters in the context of mean regression and quantile regression through analysis of NHANES and linguistic data sets.

Journal ArticleDOI
TL;DR: In this article, the authors present a review of the evolution of theory that started when Charles Stein in 1955 (In Proc. 3rd Berkeley Sympos. Math. Statist. I (1956) 197-206, Univ. California Press) showed that using each separate sample mean from k ≥ 3 Normal populations to estimate its own population mean µi can be improved upon uniformly for every possible µ = (µ1,...,µk) '.
Abstract: This review traces the evolution of theory that started when Charles Stein in 1955 (In Proc. 3rd Berkeley Sympos. Math. Statist. Probab. I (1956) 197-206, Univ. California Press) showed that using each separate sample mean from k ≥ 3 Normal populations to estimate its own population mean µi can be improved upon uniformly for every possible µ = (µ1,...,µk) ' . The dominating estimators, referred to here as being "Model-I minimax," can be found by shrinking the sample means toward any constant vector. Admissible minimax shrinkage es- timators were derived by Stein and others as posterior means based on a random effects model, "Model-II" here, wherein the µi values have their own distributions. Section 2 centers on Figure 2, which organizes a wide class of priors on the unknown Level-II hyperparameters that have been proved to yield admissible Model-I minimax shrinkage esti- mators in the "equal variance case." Putting a flat prior on the Level-II variance is unique in this class for its scale-invariance and for its con- jugacy, and it induces Stein's harmonic prior (SHP) on µi. Component estimators with real data, however, often have substan- tially "unequal variances." While Model-I minimaxity is achievable in such cases, this standard requires estimators to have "reverse shrink- ages," as when the large variance component sample means shrink less (not more) than the more accurate ones. Section 3 explains how Model-II provides appropriate shrinkage patterns, and investigates es- pecially estimators determined exactly or approximately from the pos- terior distributions based on the objective priors that produce Model-I minimaxity in the equal variances case. While correcting the reversed shrinkage defect, Model-II minimaxity can hold for every component. In a real example of hospital profiling data, the SHP prior is shown to provide estimators that are Model-II minimax, and posterior intervals that have adequate Model-II coverage, that is, both conditionally on every possible Level-II hyperparameter and for every individual com- ponent µi, i = 1,...,k.

Journal ArticleDOI
TL;DR: In this paper, the authors present a synthesis between both the Bayesian and the frequentist points of view for small area estimation, and discuss several normal theory-based methods for estimating small areas.
Abstract: The need for small area estimates is increasingly felt in both the public and private sectors in order to formulate their strategic plans. It is now widely recognized that direct small area survey estimates are highly unreliable owing to large standard errors and coefficients of variation. The reason behind this is that a survey is usually designed to achieve a specified level of accuracy at a higher level of geography than that of small areas. Lack of additional resources makes it almost imperative to use the same data to produce small area estimates. For example, if a survey is designed to estimate per capita income for a state, the same survey data need to be used to produce similar estimates for counties, subcounties and census divisions within that state. Thus, by necessity, small area estimation needs explicit, or at least implicit, use of models to link these areas. Improved small area estimates are found by “borrowing strength” from similar neighboring areas. The key to small area estimation is shrinkage of direct estimates toward some regression estimates obtained by using in addition administrative records and other available sources of information. These shrinkage estimates can often be motivated from both a Bayesian and a frequentist point of view, and indeed in this particular context, it is possible to obtain at least an operational synthesis between the two paradigms. Thus, on one hand, while small area estimates can be developed using a hierarchical Bayesian or an empirical Bayesian approach, similar estimates are also found using the theory of best linear unbiased prediction (BLUP) or empirical best linear unbiased prediction (EBLUP). The present article discusses primarily normal theory-based small area estimation techniques, and attempts a synthesis between both the Bayesian and the frequentist points of view. The results are mostly discussed for random effects models and their hierarchical Bayesian counterparts. A few miscellaneous remarks are made at the end describing the current research for more complex models including some nonnormal ones. Also provided are some pointers for future research.

Journal ArticleDOI
TL;DR: In this article, the authors discuss consistency, asymptotic distribu- tion theory, information inequalities and their relations with efficiency and superefficiency for a general class of m-estimators.
Abstract: In some estimation problems, especially in applications dealing with information theory, signal processing and biology, theory provides us with additional information allowing us to restrict the parameter space to a finite number of points. In this case, we speak of discrete parameter models. Even though the problem is quite old and has interesting connections with testing and model selection, asymptotic theory for these models has hardly ever been studied. Therefore, we discuss consistency, asymptotic distribu- tion theory, information inequalities and their relations with efficiency and superefficiency for a general class of m-estimators.

Journal ArticleDOI
TL;DR: In this paper, the authors discuss minimaxity and adaptive minimaxness in nonparametric function estimation with respect to three interrelated problems, function estimation under global integrated squared error, estimation under pointwise squared error and non-parametric confidence intervals.
Abstract: Since Stein’s 1956 seminal paper, shrinkage has played a fundamental role in both parametric and nonparametric inference. This article discusses minimaxity and adaptive minimaxity in nonparametric function estimation. Three interrelated problems, function estimation under global integrated squared error, estimation under pointwise squared error, and nonparametric confidence intervals, are considered. Shrinkage is pivotal in the development of both the minimax theory and the adaptation theory. While the three problems are closely connected and the minimax theories bear some similarities, the adaptation theories are strikingly different. For example, in a sharp contrast to adaptive point estimation, in many common settings there do not exist nonparametric confidence intervals that adapt to the unknown smoothness of the underlying function. A concise account of these theories is given. The connections as well as differences among these problems are discussed and illustrated through examples.

Journal ArticleDOI
TL;DR: A brief review on quantum computation, quantum simulation and quantum information can be found in this article, where the basic concepts of quantum computation and quantum simulation are introduced and quantum algorithms are presented, which are known to be much faster than the available classic algorithms.
Abstract: Quantum computation and quantum information are of great current interest in computer science, mathematics, physical sciences and engineering. They will likely lead to a new wave of technological innovations in communication, computation and cryptography. As the theory of quantum physics is fundamentally stochastic, randomness and uncertainty are deeply rooted in quantum computation, quantum simulation and quantum information. Consequently quantum algorithms are random in nature, and quantum simulation utilizes Monte Carlo techniques extensively. Thus statistics can play an important role in quantum computation and quantum simulation, which in turn offer great potential to revolutionize computational statistics. While only pseudo-random numbers can be generated by classical computers, quantum computers are able to produce genuine random numbers; quantum computers can exponentially or quadratically speed up median evaluation, Monte Carlo integration and Markov chain simulation. This paper gives a brief review on quantum computation, quantum simulation and quantum information. We introduce the basic concepts of quantum computation and quantum simulation and present quantum algorithms that are known to be much faster than the available classic algorithms. We provide a statistical framework for the analysis of quantum algorithms and quantum simulation.

Journal ArticleDOI
TL;DR: This article proposed a joint specification of the prior distribution across models so that sensitivity of posterior model probabilities to the dispersion of prior distributions for the parameters of individual models is diminished, and illustrate the behavior of inferential and predictive posterior quantities in linear and log-linear regressions under their proposed prior densities with a series of simulated and real data examples.
Abstract: We consider the specification of prior distributions for Bayesian model comparison, focusing on regression-type models. We propose a particular joint specification of the prior distribution across models so that sensitivity of posterior model probabilities to the dispersion of prior distributions for the parameters of individual models (Lindley’s paradox) is diminished. We illustrate the behavior of inferential and predictive posterior quantities in linear and log-linear regressions under our proposed prior densities with a series of simulated and real data examples.

Journal ArticleDOI
TL;DR: The authors give an overview of vari- ous analysis methods for matched cohort studies with binary exposures and binary outcomes, as well as the underlying assumptions in these methods and how do the methods compare in terms of statistical power.
Abstract: To improve confounder adjustments, observational studies are often matched on potential confounders. While matched case-control studies are common and well covered in the literature, our focus here is on matched cohort studies, which are less common and sparsely dis- cussed in the literature. Matched data also arise naturally in twin stud- ies, as a cohort of exposure-discordant twins can be viewed as being matched on a large number of potential confounders. The analysis of twin studies will be given special attention. We give an overview of vari- ous analysis methods for matched cohort studies with binary exposures and binary outcomes. In particular, our aim is to answer the following questions: (1) What are the target parameters in the common analysis methods? (2) What are the underlying assumptions in these methods? (3) How do the methods compare in terms of statistical power?

Journal ArticleDOI
TL;DR: In this article, the authors describe a method for a model-based analysis of clinical safety data called multivariate Bayesian logistic regression (MBLR), which allows information from the different issues to "borrow strength" from each other.
Abstract: This paper describes a method for a model-based analysis of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a set of medically related issues, or response variables, and MBLR allows information from the different issues to "borrow strength" from each other. The method is especially suited to sparse response data, as often occurs when fine-grained adverse events are collected from subjects in studies sized more for efficacy than for safety investigations. A combined anal- ysis of data from multiple studies can be performed and the method enables a search for vulnerable subgroups based on the covariates in the regression model. An example involving 10 medically related issues from a pool of 8 studies is presented, as well as simulations showing distributional properties of the method.

Journal ArticleDOI
TL;DR: In this article, an extension of the oracle results to the case of quasi-likelihood loss is presented, and the results are derived under fourth moment conditions on the error distribution.
Abstract: We consider the theory for the high-dimensional generalized linear model with the Lasso. After a short review on theoretical results in literature, we present an extension of the oracle results to the case of quasi-likelihood loss. We prove bounds for the prediction error and $\ell_{1}$-error. The results are derived under fourth moment conditions on the error distribution. The case of robust loss is also given. We moreover show that under an irrepresentable condition, the $\ell_{1}$-penalized quasi-likelihood estimator has no false positives.

Journal ArticleDOI
TL;DR: In a remarkable series of papers beginning in 1956, Charles Stein set the stage for the future development of minimax shrinkage estimators of a multivariate normal mean under quadratic loss as mentioned in this paper.
Abstract: In a remarkable series of papers beginning in 1956, Charles Stein set the stage for the future development of minimax shrinkage estimators of a multivariate normal mean under quadratic loss. More recently, parallel developments have seen the emergence of minimax shrinkage estimators of multivariate normal predictive densities under Kullback–Leibler risk. We here describe these parallels emphasizing the focus on Bayes procedures and the derivation of the superharmonic conditions for minimaxity as well as further developments of new minimax shrinkage predictive density estimators including multiple shrinkage estimators, empirical Bayes estimators, normal linear model regression estimators and nonparametric regression estimators.

Journal ArticleDOI
TL;DR: The possibility of improving on the usual multivariate normal confidence was first discussed in Stein (1962) using the ideas of shrinkage through Bayesian and empirical Bayesian arguments, and domination results, both analytic and numerical, have been obtained as mentioned in this paper.
Abstract: The possibility of improving on the usual multivariate normal confidence was first discussed in Stein (1962). Using the ideas of shrinkage, through Bayesian and empirical Bayesian arguments, domination results, both analytic and numerical, have been obtained. Here we trace some of the developments in confidence set estimation.

Journal ArticleDOI
TL;DR: In this article, the authors present an expository development of loss estimation with substantial emphasis on the setting where the distributional context is normal and its extension to the case where the underlying distribution is spherically symmetric.
Abstract: Let X be a random vector with distribution Pθ where θ is an unknown parameter. When estimating θ by some estimator φ(X) under a loss function L(θ, φ), classical decision theory advocates that such a decision rule should be used if it has suitable properties with respect to the frequentist risk R(θ, φ). However, after having observed X = x, instances arise in practice in which φ is to be accompanied by an assessment of its loss, L(θ, φ(x)), which is unobservable since θ is unknown. A common approach to this assessment is to consider estimation of L(θ, φ(x)) by an estimator δ, called a loss estimator. We present an expository development of loss estimation with substantial emphasis on the setting where the distributional context is normal and its extension to the case where the underlying distribution is spherically symmetric. Our overview covers improved loss estimators for least squares but primarily focuses on shrinkage estimators. Bayes estimation is also considered and comparisons are made with unbiased estimation.

Journal ArticleDOI
TL;DR: In this article, a geometrical explanation for the inadmissibility of the usual estimator of a multivariate normal mean is presented, which is based on the spherical symmetry of the problem.
Abstract: Shrinkage estimation has become a basic tool in the analysis of high-dimensional data. Historically and conceptually a key develop- ment toward this was the discovery of the inadmissibility of the usual estimator of a multivariate normal mean. This article develops a geometrical explanation for this inadmissibil- ity. By exploiting the spherical symmetry of the problem it is possi- ble to effectively conceptualize the multidimensional setting in a two- dimensional framework that can be easily plotted and geometrically an- alyzed. We begin with the heuristic explanation for inadmissibility that was given by Stein (In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1954-1955, Vol. I (1956) 197-206, Univ. California Press). Some geometric figures are included to make this reasoning more tangible. It is also explained why Stein's argument falls short of yielding a proof of inadmissibility, even when the dimension, p, is much larger than p = 3. We then extend the geometric idea to yield increasingly persuasive arguments for inadmissibility when p ≥ 3, albeit at the cost of increased geometric and computational detail.

Journal Article
TL;DR: In this paper, the convergence properties of component-wise Markov chain Monte Carlo (MCMC) simulations have been investigated and the connections between the convergence rates of various componentwise strategies have been analyzed.
Abstract: It is common practice in Markov chain Monte Carlo to update the simulation one variable (or sub-block of variables) at a time, rather than conduct a single full-dimensional update. When it is possible to draw from each full-conditional distribution associated with the target this is just a Gibbs sampler. Often at least one of the Gibbs updates is replaced with a Metropolis-Hastings step, yielding a Metropolis-Hastings-within-Gibbs algorithm. Strategies for combining component-wise updates include composition, random sequence and random scans. While these strategies can ease MCMC implementation and produce superior empirical performance compared to full-dimensional updates, the theoretical convergence properties of the associated Markov chains have received limited attention. We present conditions under which some component-wise Markov chains converge to the stationary distribution at a geometric rate. We pay particular attention to the connections between the convergence rates of the various component-wise strategies. This is important since it ensures the existence of tools that an MCMC practitioner can use to be as confident in the simulation results as if they were based on independent and identically distributed samples. We illustrate our results in two examples including a hierarchical linear mixed model and one involving maximum likelihood estimation for mixed models.

Journal ArticleDOI
TL;DR: In this paper, a review of advances in Stein-type shrinkage estima- tion for spherically symmetric distributions is presented, where the main focus is on distributional robustness results in cases where a residual vector is available to estimate an unknown scale parameter.
Abstract: This paper reviews advances in Stein-type shrinkage estima- tion for spherically symmetric distributions. Some emphasis is placed on developing intuition as to why shrinkage should work in location problems whether the underlying population is normal or not. Consid- erable attention is devoted to generalizing the "Stein lemma" which underlies much of the theoretical development of improved minimax estimation for spherically symmetric distributions. A main focus is on distributional robustness results in cases where a residual vector is available to estimate an unknown scale parameter, and, in particu- lar, in finding estimators which are simultaneously generalized Bayes and minimax over large classes of spherically symmetric distributions. Some attention is also given to the problem of estimating a location vector restricted to lie in a polyhedral cone.

Journal ArticleDOI
TL;DR: In this paper, the authors make the link between extreme-value copulas and max-stable processes explicit and review the existing nonparametric inference methods for the two types of processes.
Abstract: The choice for parametric techniques in the discussion article is motivated by the claim that for multivariate extreme-value distributions, “owing to the curse of dimensionality, nonparametric estimation has essentially been confined to the bivariate case” (Section 2.3). Thanks to recent developments, this is no longer true if data take the form of multivariate maxima, as is the case in the article. A wide range of nonparametric, rank-based estimators and tests are nowadays available for extreme-value copulas. Since max-stable processes have extreme-value copulas, these methods are applicable for inference on max-stable processes too. The aim of this note is to make the link between extreme-value copulas and max-stable processes explicit and to review the existing nonparametric inference methods.

Journal ArticleDOI
TL;DR: Davison et al. as mentioned in this paper proposed a statistical model of spatial extremities, which is based on the Padoan-Ribatet model. But the model is not static.
Abstract: Discussion of "Statistical Modeling of Spatial Extremes" by A. C. Davison, S. A. Padoan and M. Ribatet [arXiv:1208.3378].

Journal ArticleDOI
TL;DR: In this paper, a special issue on minimax shrinkage estimation is devoted to developments that ultimately arose from Stein's investigations into improving on the UMVUE of a multivariate normal mean vector.
Abstract: In 1956, Charles Stein published an article that was to forever change the statistical approach to high-dimensional estimation. His stunning discovery that the usual estimator of the normal mean vector could be dominated in dimensions 3 and higher amazed many at the time, and became the catalyst for a vast and rich literature of substantial importance to statistical theory and practice. As a tribute to Charles Stein, this special issue on minimax shrinkage estimation is devoted to developments that ultimately arose from Stein’s investigations into improving on the UMVUE of a multivariate normal mean vector. Of course, much of the early literature on the subject was due to Stein himself, including a key technical lemma commonly referred to as Stein’s Lemma, which leads to an unbiased estimator of the risk of an almost arbitrary estimator of the mean vector. The following ten papers assembled in this volume represent some of the many areas into which shrinkage has expanded (a one-dimensional pun, no doubt). Clearly, the shrinkage literature has branched out substantially since 1956, the many contributors and the breadth of theory and practice being now far too large to cover with any degree of completeness in a review issue such as this one. But what these papers do show is the lasting impact of Stein (1956), and the ongoing vitality of the huge area that he catalyzed.