Showing papers in &quot;Statistics and Computing in 2011&quot;

Slice sampling mixture models

TL;DR: This paper investigates two families that connect the training error and K-fold cross-validation, which has a downward bias and has an upward bias.

...read moreread less

Abstract: Estimation of prediction accuracy is important when our aim is prediction. The training error is an easy estimate of prediction error, but it has a downward bias. On the other hand, K-fold cross-validation has an upward bias. The upward bias may be negligible in leave-one-out cross-validation, but it sometimes cannot be neglected in 5-fold or 10-fold cross-validation, which are favored from a computational standpoint. Since the training error has a downward bias and K-fold cross-validation has an upward bias, there will be an appropriate estimate in a family that connects the two estimates. In this paper, we investigate two families that connect the training error and K-fold cross-validation.

...read moreread less

676 citations

Journal Article•DOI•

[...]

Maria Kalli¹, Jim E. Griffin¹, Stephen G. Walker¹•Institutions (1)

University of Kent¹

01 Jan 2011-Statistics and Computing

TL;DR: A more efficient version of the slice sampler for Dirichlet process mixture models described by Walker allows for the fitting of infinite mixture models with a wide-range of prior specifications and considers priors defined through infinite sequences of independent positive random variables.

...read moreread less

Abstract: We propose a more efficient version of the slice sampler for Dirichlet process mixture models described by Walker (Commun. Stat., Simul. Comput. 36:45---54, 2007). This new sampler allows for the fitting of infinite mixture models with a wide-range of prior specifications. To illustrate this flexibility we consider priors defined through infinite sequences of independent positive random variables. Two applications are considered: density estimation using mixture models and hazard function estimation. In each case we show how the slice efficient sampler can be applied to make inference in the models. In the mixture case, two submodels are studied in detail. The first one assumes that the positive random variables are Gamma distributed and the second assumes that they are inverse-Gaussian distributed. Both priors have two hyperparameters and we consider their effect on the prior distribution of the number of occupied clusters in a sample. Extensive computational comparisons with alternative "conditional" simulation techniques for mixture models using the standard Dirichlet process prior and our new priors are made. The properties of the new priors are illustrated on a density estimation problem.

...read moreread less

371 citations

Journal Article•DOI•

Diffusive nested sampling

[...]

Brendon J. Brewer¹, Lívia B. Pártay², Gábor Csányi²•Institutions (2)

University of California, Santa Barbara¹, University of Cambridge²

Fast simulation of truncated Gaussian distributions

TL;DR: A general Monte Carlo method based on Nested Sampling, for sampling complex probability distributions and estimating the normalising constant is introduced, and it is found that it can achieve four times the accuracy of classic MCMC-based Nests Sampling.

...read moreread less

Abstract: We introduce a general Monte Carlo method based on Nested Sampling (NS), for sampling complex probability distributions and estimating the normalising constant. The method uses one or more particles, which explore a mixture of nested probability distributions, each successive distribution occupying ~e ?1 times the enclosed prior mass of the previous distribution. While NS technically requires independent generation of particles, Markov Chain Monte Carlo (MCMC) exploration fits naturally into this technique. We illustrate the new method on a test problem and find that it can achieve four times the accuracy of classic MCMC-based Nested Sampling, for the same computational effort; equivalent to a factor of 16 speedup. An additional benefit is that more samples and a more accurate evidence value can be obtained simply by continuing the run for longer, as in standard MCMC.

...read moreread less

155 citations

Journal Article•DOI•

[...]

Nicolas Chopin¹•Institutions (1)

ENSAE ParisTech¹

A quasi-Newton acceleration for high-dimensional optimization algorithms

TL;DR: This work designs a table-based algorithm that is computationally faster than alternative algorithms and an accept-reject algorithm for simulating a Gaussian vector X, conditional on the fact that each component of X belongs to a finite interval, or a semi-finite interval.

...read moreread less

Abstract: We consider the problem of simulating a Gaussian vector X, conditional on the fact that each component of X belongs to a finite interval [a i ,b i ], or a semi-finite interval [a i ,+?). In the one-dimensional case, we design a table-based algorithm that is computationally faster than alternative algorithms. In the two-dimensional case, we design an accept-reject algorithm. According to our calculations and numerical studies, the acceptance rate of this algorithm is bounded from below by 0.5 for semi-finite truncation intervals, and by 0.47 for finite intervals. Extension to three or more dimensions is discussed.

...read moreread less

137 citations

Journal Article•DOI•

[...]

Hua Zhou¹, David Alexander¹, Kenneth Lange¹•Institutions (1)

University of California, Los Angeles¹

Extending mixtures of multivariate t-factor analyzers

TL;DR: This paper presents a new quasi-Newton acceleration scheme that requires only modest increments in computation per iteration and overall storage and rivals or surpasses the performance of SQUAREM on several representative test problems.

...read moreread less

Abstract: In many statistical problems, maximum likelihood estimation by an EM or MM algorithm suffers from excruciatingly slow convergence. This tendency limits the application of these algorithms to modern high-dimensional problems in data mining, genomics, and imaging. Unfortunately, most existing acceleration techniques are ill-suited to complicated models involving large numbers of parameters. The squared iterative methods (SQUAREM) recently proposed by Varadhan and Roland constitute one notable exception. This paper presents a new quasi-Newton acceleration scheme that requires only modest increments in computation per iteration and overall storage and rivals or surpasses the performance of SQUAREM on several representative test problems.

...read moreread less

134 citations

Journal Article•DOI•

[...]

Jeffrey L. Andrews¹, Paul D. McNicholas¹•Institutions (1)

University of Guelph¹

A goodness-of-fit test for multivariate multiparameter copulas based on multiplier central limit theorems

TL;DR: The extension of the mixtures of multivariate t-factor analyzers model is described to include constraints on the degrees of freedom, the factor loadings, and the error variance matrices to create a family of six mixture models, including parsimonious models.

...read moreread less

Abstract: Model-based clustering typically involves the development of a family of mixture models and the imposition of these models upon data. The best member of the family is then chosen using some criterion and the associated parameter estimates lead to predicted group memberships, or clusterings. This paper describes the extension of the mixtures of multivariate t-factor analyzers model to include constraints on the degrees of freedom, the factor loadings, and the error variance matrices. The result is a family of six mixture models, including parsimonious models. Parameter estimates for this family of models are derived using an alternating expectation-conditional maximization algorithm and convergence is determined based on Aitken's acceleration. Model selection is carried out using the Bayesian information criterion (BIC) and the integrated completed likelihood (ICL). This novel family of mixture models is then applied to simulated and real data where clustering performance meets or exceeds that of established model-based clustering methods. The simulation studies include a comparison of the BIC and the ICL as model selection techniques for this novel family of models. Application to simulated data with larger dimensionality is also explored.

...read moreread less

131 citations

Journal Article•DOI•

[...]

Ivan Kojadinovic¹, Jun Yan²•Institutions (2)

University of Auckland¹, University of Connecticut²

01 Jan 2011-Statistics and Computing

TL;DR: The study of the finite-sample performance of the multiplier version of the goodness-of-fit test for bivariate one-parameter copulas showed that it provides a valid alternative to the parametric bootstrap-based test while being orders of magnitude faster.

...read moreread less

Abstract: Recent large scale simulations indicate that a powerful goodness-of-fit test for copulas can be obtained from the process comparing the empirical copula with a parametric estimate of the copula derived under the null hypothesis. A first way to compute approximate p-values for statistics derived from this process consists of using the parametric bootstrap procedure recently thoroughly revisited by Genest and Remillard. Because it heavily relies on random number generation and estimation, the resulting goodness-of-fit test has a very high computational cost that can be regarded as an obstacle to its application as the sample size increases. An alternative approach proposed by the authors consists of using a multiplier procedure. The study of the finite-sample performance of the multiplier version of the goodness-of-fit test for bivariate one-parameter copulas showed that it provides a valid alternative to the parametric bootstrap-based test while being orders of magnitude faster. The aim of this work is to extend the multiplier approach to multivariate multiparameter copulas and study the finite-sample performance of the resulting test. Particular emphasis is put on elliptical copulas such as the normal and the t as these are flexible models in a multivariate setting. The implementation of the procedure for the latter copulas proves challenging and requires the extension of the Plackett formula for the t distribution to arbitrary dimension. Extensive Monte Carlo experiments, which could be carried out only because of the good computational properties of the multiplier approach, confirm in the multivariate multiparameter context the satisfactory behavior of the goodness-of-fit test.

...read moreread less

122 citations

Journal Article•DOI•

Towards optimal scaling of metropolis-coupled Markov chain Monte Carlo

[...]

Yves F. Atchadé¹, Gareth O. Roberts², Jeffrey S. Rosenthal³•Institutions (3)

University of Michigan¹, University of Warwick², University of Toronto³

Inferring multiple graphical structures

TL;DR: It is proved that, under certain conditions, it is optimal (in terms of maximising the expected squared jumping distance) to space the temperatures so that the proportion of temperature swaps which are accepted is approximately 0.234.

...read moreread less

Abstract: We consider optimal temperature spacings for Metropolis-coupled Markov chain Monte Carlo (MCMCMC) and Simulated Tempering algorithms. We prove that, under certain conditions, it is optimal (in terms of maximising the expected squared jumping distance) to space the temperatures so that the proportion of temperature swaps which are accepted is approximately 0.234. This generalises related work by physicists, and is consistent with previous work about optimal scaling of random-walk Metropolis algorithms.

...read moreread less

112 citations

Journal Article•DOI•

[...]

Julien Chiquet¹, Yves Grandvalet², Christophe Ambroise¹•Institutions (2)

University of Évry Val d'Essonne¹, University of Technology of Compiègne²

Finite mixtures of matrix normal distributions for classifying three-way data

TL;DR: This paper proposes two approaches for estimating multiple related graphs, by rendering the closeness assumption into an empirical prior or group penalties, and provides quantitative results demonstrating the benefits of the proposed approaches.

...read moreread less

Abstract: Gaussian Graphical Models provide a convenient framework for representing dependencies between variables. Recently, this tool has received a high interest for the discovery of biological networks. The literature focuses on the case where a single network is inferred from a set of measurements. But, as wetlab data is typically scarce, several assays, where the experimental conditions affect interactions, are usually merged to infer a single network. In this paper, we propose two approaches for estimating multiple related graphs, by rendering the closeness assumption into an empirical prior or group penalties. We provide quantitative results demonstrating the benefits of the proposed approaches. The methods presented in this paper are embeded in the R package simone from version 1.0-0 and later.

...read moreread less

102 citations

Journal Article•DOI•

[...]

Cinzia Viroli¹•Institutions (1)

University of Bologna¹

Exploring the number of groups in robust model-based clustering

TL;DR: This work defines and explores finite mixtures of matrix normals and shows that the proposed mixture model can be a powerful tool for classifying three-way data both in supervised and unsupervised problems.

...read moreread less

Abstract: Matrix-variate distributions represent a natural way for modeling random matrices. Realizations from random matrices are generated by the simultaneous observation of variables in different situations or locations, and are commonly arranged in three-way data structures. Among the matrix-variate distributions, the matrix normal density plays the same pivotal role as the multivariate normal distribution in the family of multivariate distributions. In this work we define and explore finite mixtures of matrix normals. An EM algorithm for the model estimation is developed and some useful properties are demonstrated. We finally show that the proposed mixture model can be a powerful tool for classifying three-way data both in supervised and unsupervised problems. A simulation study and some real examples are presented.

...read moreread less

94 citations

Journal Article•DOI•

[...]

Luis Angel García-Escudero¹, Alfonso Gordaliza¹, Carlos Matrán¹, Agustín Mayo-Iscar¹•Institutions (1)

University of Valladolid¹

D-optimal designs via a cocktail algorithm

TL;DR: Some exploratory “trimming-based” tools are presented in this work, the monitoring of optimal values reached when solving a robust clustering criteria and the use of some “discriminant” factors are the basis for these exploratory tools.

...read moreread less

Abstract: Two key questions in Clustering problems are how to determine the number of groups properly and measure the strength of group-assignments. These questions are specially involved when the presence of certain fraction of outlying data is also expected. Any answer to these two key questions should depend on the assumed probabilistic-model, the allowed group scatters and what we understand by noise. With this in mind, some exploratory "trimming-based" tools are presented in this work together with their justifications. The monitoring of optimal values reached when solving a robust clustering criteria and the use of some "discriminant" factors are the basis for these exploratory tools.

...read moreread less

Journal Article•DOI•

[...]

Yaming Yu¹•Institutions (1)

University of California, Irvine¹

Segmentation of the mean of heteroscedastic data via cross-validation

TL;DR: In this article, a fast new algorithm is proposed for numerical computation of (approximate) D-optimal designs, the cocktail algorithm, which extends the well-known vertex direction method (VDM) and the multiplicative algorithm.

...read moreread less

Abstract: A fast new algorithm is proposed for numerical computation of (approximate) D-optimal designs. This cocktail algorithm extends the well-known vertex direction method (VDM; Fedorov in Theory of Optimal Experiments, 1972) and the multiplicative algorithm (Silvey et al. in Commun. Stat. Theory Methods 14:1379---1389, 1978), and shares their simplicity and monotonic convergence properties. Numerical examples show that the cocktail algorithm can lead to dramatically improved speed, sometimes by orders of magnitude, relative to either the multiplicative algorithm or the vertex exchange method (a variant of VDM). Key to the improved speed is a new nearest neighbor exchange strategy, which acts locally and complements the global effect of the multiplicative algorithm. Possible extensions to related problems such as nonparametric maximum likelihood estimation are mentioned.

...read moreread less

Journal Article•DOI•

[...]

Sylvain Arlot¹, Alain Celisse²•Institutions (2)

École Normale Supérieure¹, university of lille²

A comparison of estimators for regression models with change points

TL;DR: This paper tackles the problem of detecting abrupt changes in the mean of a heteroscedastic signal by model selection, without knowledge on the variations of the noise, and proposes a new family of change-point detection procedures, showing that robustness to heterOScedasticity can indeed be required for their analysis.

...read moreread less

Abstract: This paper tackles the problem of detecting abrupt changes in the mean of a heteroscedastic signal by model selection, without knowledge on the variations of the noise. A new family of change-point detection procedures is proposed, showing that cross-validation methods can be successful in the heteroscedastic framework, whereas most existing procedures are not robust to heteroscedasticity. The robustness to heteroscedasticity of the proposed procedures is supported by an extensive simulation study, together with recent partial theoretical results. An application to Comparative Genomic Hybridization (CGH) data is provided, showing that robustness to heteroscedasticity can indeed be required for their analysis.

...read moreread less

Journal Article•DOI•

[...]

Cathy W. S. Chen¹, Jennifer S. K. Chan², Richard Gerlach², William Y. Hsieh¹•Institutions (2)

Feng Chia University¹, University of Sydney²

Efficient Bayesian analysis of multiple changepoint models with dependence across segments

TL;DR: Four methods for estimating single or multiple change points in a regression model, when both the error variance and regression coefficients change simultaneously at the unknown point(s): Bayesian, Julious, grid search, and the segmented methods are compared.

...read moreread less

Abstract: We consider two problems concerning locating change points in a linear regression model. One involves jump discontinuities (change-point) in a regression model and the other involves regression lines connected at unknown points. We compare four methods for estimating single or multiple change points in a regression model, when both the error variance and regression coefficients change simultaneously at the unknown point(s): Bayesian, Julious, grid search, and the segmented methods. The proposed methods are evaluated via a simulation study and compared via some standard measures of estimation bias and precision. Finally, the methods are illustrated and compared using three real data sets. The simulation and empirical results overall favor both the segmented and Bayesian methods of estimation, which simultaneously estimate the change point and the other model parameters, though only the Bayesian method is able to handle both continuous and dis-continuous change point problems successfully. If it is known that regression lines are continuous then the segmented method ranked first among methods.

...read moreread less

Journal Article•DOI•

[...]

Paul Fearnhead¹, Zhen Liu¹•Institutions (1)

Lancaster University¹

Forecasting with non-homogeneous hidden Markov models

TL;DR: This work proposes an efficient online algorithm for sampling from an approximation to the posterior distribution of the number and position of the changepoints of a class of multiple changepoint models, and illustrates the power of this approach through fitting piecewise polynomial models to data.

...read moreread less

Abstract: We consider Bayesian analysis of a class of multiple changepoint models. While there are a variety of efficient ways to analyse these models if the parameters associated with each segment are independent, there are few general approaches for models where the parameters are dependent. Under the assumption that the dependence is Markov, we propose an efficient online algorithm for sampling from an approximation to the posterior distribution of the number and position of the changepoints. In a simulation study, we show that the approximation introduced is negligible. We illustrate the power of our approach through fitting piecewise polynomial models to data, under a model which allows for either continuity or discontinuity of the underlying curve at each changepoint. This method is competitive with, or outperform, other methods for inferring curves from noisy data; and uniquely it allows for inference of the locations of discontinuities in the underlying curve.

...read moreread less

Journal Article•DOI•

[...]

Loukia Meligkotsidou¹, Petros Dellaportas²•Institutions (2)

National and Kapodistrian University of Athens¹, Athens University of Economics and Business²

A generalization of the adaptive rejection sampling algorithm

TL;DR: A Bayesian forecasting methodology of discrete-time finite state-space hidden Markov models with non-constant transition matrix that depends on a set of exogenous covariates allowing for model uncertainty regarding the set of covariates that affect the transition matrix.

...read moreread less

Abstract: We present a Bayesian forecasting methodology of discrete-time finite state-space hidden Markov models with non-constant transition matrix that depends on a set of exogenous covariates. We describe an MCMC reversible jump algorithm for predictive inference, allowing for model uncertainty regarding the set of covariates that affect the transition matrix. We apply our models to interest rates and we show that our general model formulation improves the predictive ability of standard homogeneous hidden Markov models.

...read moreread less

Journal Article•DOI•

[...]

Luca Martino¹, Joaquín Míguez¹•Institutions (1)

Charles III University of Madrid¹

Parallel multivariate slice sampling

TL;DR: This paper introduces a generalized adaptive rejection sampling procedure that can be applied with a broad class of target probability distributions, possibly non-log-concave and exhibiting multiple modes, and yields a sequence of proposal densities that converge toward the target pdf, thus achieving very high acceptance rates.

...read moreread less

Abstract: Rejection sampling is a well-known method to generate random samples from arbitrary target probability distributions. It demands the design of a suitable proposal probability density function (pdf) from which candidate samples can be drawn. These samples are either accepted or rejected depending on a test involving the ratio of the target and proposal densities. The adaptive rejection sampling method is an efficient algorithm to sample from a log-concave target density, that attains high acceptance rates by improving the proposal density whenever a sample is rejected. In this paper we introduce a generalized adaptive rejection sampling procedure that can be applied with a broad class of target probability distributions, possibly non-log-concave and exhibiting multiple modes. The proposed technique yields a sequence of proposal densities that converge toward the target pdf, thus achieving very high acceptance rates. We provide a simple numerical example to illustrate the basic use of the proposed technique, together with a more elaborate positioning application using real data.

...read moreread less

Journal Article•DOI•

[...]

Matthew M. Tibbits¹, Murali Haran¹, John Liechty¹•Institutions (1)

Pennsylvania State University¹

ROC curve and covariates: extending induced methodology to the non-parametric framework

TL;DR: It is demonstrated that it is possible to construct a multivariate slice sampler that has good mixing properties and is efficient in terms of computing time, and how parallel computing can be useful for making MCMC algorithms computationally efficient.

...read moreread less

Abstract: Slice sampling provides an easily implemented method for constructing a Markov chain Monte Carlo (MCMC) algorithm. However, slice sampling has two major drawbacks: (i) it requires repeated evaluation of likelihoods for each update, which can make it impractical when evaluations are expensive or as the number of evaluations grows (geometrically) with the dimension of the slice sampler, and (ii) since it can be challenging to construct multivariate updates, the updates are typically univariate, which often results in slow mixing samplers. We propose an approach to multivariate slice sampling that naturally lends itself to a parallel implementation. Our approach takes advantage of recent advances in computer architectures, for instance, the newest generation of graphics cards can execute roughly 30,000 threads simultaneously. We demonstrate that it is possible to construct a multivariate slice sampler that has good mixing properties and is efficient in terms of computing time. The contributions of this article are therefore twofold. We study approaches for constructing a multivariate slice sampler, and we show how parallel computing can be useful for making MCMC algorithms computationally efficient. We study various implementations of our algorithm in the context of real and simulated data.

...read moreread less

Journal Article•DOI•

[...]

María Xosé Rodríguez-Álvarez¹, Javier Roca-Pardiñas², Carmen Cadarso-Suárez¹•Institutions (2)

University of Santiago de Compostela¹, University of Vigo²

Density-based Silhouette diagnostics for clustering methods

TL;DR: The induced methodology is extended by allowing for arbitrary non-parametric effects of a continuous covariate effect not only on the mean, but also on the variance of the diagnostic test, which has proved useful for providing age-specific thresholds for anthropometric measures in the Galician community.

...read moreread less

Abstract: Continuous diagnostic tests are often used to discriminate between diseased and healthy populations. The receiver operating characteristic (ROC) curve is a widely used tool that provides a graphical visualisation of the effectiveness of such tests. The potential performance of the tests in terms of distinguishing diseased from healthy people may be strongly influenced by covariates, and a variety of regression methods for adjusting ROC curves has been developed. Until now, these methodologies have assumed that covariate effects have parametric forms, but in this paper we extend the induced methodology by allowing for arbitrary non-parametric effects of a continuous covariate. To this end, local polynomial kernel smoothers are used in the estimation procedure. Our method allows for covariate effect not only on the mean, but also on the variance of the diagnostic test. We also present a bootstrap-based method for testing for a significant covariate effect on the ROC curve. To illustrate the method, endocrine data were analysed with the aim of assessing the performance of anthropometry for predicting clusters of cardiovascular risk factors in an adult population in Galicia (NW Spain), duly adjusted for age. The proposed methodology has proved useful for providing age-specific thresholds for anthropometric measures in the Galician community.

...read moreread less

Journal Article•DOI•

[...]

Giovanna Menardi¹•Institutions (1)

University of Trieste¹

Multivariate linear regression with non-normal errors: a solution based on mixture models

TL;DR: A suitable modification of the Silhouette information aimed at evaluating the quality of clusters in a density-based framework is proposed, based on the estimation of the data posterior probabilities of belonging to the clusters and may be used to measure the confidence about data allocation to the cluster as well as to choose the best partition among different ones.

...read moreread less

Abstract: Silhouette information evaluates the quality of the partition detected by a clustering technique. Since it is based on a measure of distance between the clustered observations, its standard formulation is not adequate when a density-based clustering technique is used. In this work we propose a suitable modification of the Silhouette information aimed at evaluating the quality of clusters in a density-based framework. It is based on the estimation of the data posterior probabilities of belonging to the clusters and may be used to measure our confidence about data allocation to the clusters as well as to choose the best partition among different ones.

...read moreread less

Journal Article•DOI•

[...]

Gabriele Soffritti¹, Giuliano Galimberti¹•Institutions (1)

University of Bologna¹

A computational framework for empirical Bayes inference

TL;DR: A new solution is proposed, which is obtained by modelling the error term distribution through a finite mixture of multi-dimensional Gaussian components, which results in the classical approach.

...read moreread less

Abstract: In some situations, the distribution of the error terms of a multivariate linear regression model may depart from normality. This problem has been addressed, for example, by specifying a different parametric distribution family for the error terms, such as multivariate skewed and/or heavy-tailed distributions. A new solution is proposed, which is obtained by modelling the error term distribution through a finite mixture of multi-dimensional Gaussian components. The multivariate linear regression model is studied under this assumption. Identifiability conditions are proved and maximum likelihood estimation of the model parameters is performed using the EM algorithm. The number of mixture components is chosen through model selection criteria; when this number is equal to one, the proposal results in the classical approach. The performances of the proposed approach are evaluated through Monte Carlo experiments and compared to the ones of other approaches. In conclusion, the results obtained from the analysis of a real dataset are presented.

...read moreread less

Journal Article•DOI•

[...]

Yves F. Atchadé¹•Institutions (1)

University of Michigan¹

Stochastic matching pursuit for Bayesian variable selection

TL;DR: A framework based on recent developments in adaptive MCMC is proposed, where the problem of sampling from the posterior distribution of a parameter with a hyper-parameter set to its maximum likelihood estimate is addressed more efficiently using a single Monte Carlo run.

...read moreread less

Abstract: In empirical Bayes inference one is typically interested in sampling from the posterior distribution of a parameter with a hyper-parameter set to its maximum likelihood estimate. This is often problematic particularly when the likelihood function of the hyper-parameter is not available in closed form and the posterior distribution is intractable. Previous works have dealt with this problem using a multi-step approach based on the EM algorithm and Markov Chain Monte Carlo (MCMC). We propose a framework based on recent developments in adaptive MCMC, where this problem is addressed more efficiently using a single Monte Carlo run. We discuss the convergence of the algorithm and its connection with the EM algorithm. We apply our algorithm to the Bayesian Lasso of Park and Casella (J. Am. Stat. Assoc. 103:681---686, 2008) and on the empirical Bayes variable selection of George and Foster (J. Am. Stat. Assoc. 87:731---747, 2000).

...read moreread less

Journal Article•DOI•

[...]

Ray-Bing Chen¹, Chi-Hsiang Chu¹, Te-You Lai¹, Ying Nian Wu²•Institutions (2)

National University of Kaohsiung¹, University of California, Los Angeles²

On the grouped selection and model complexity of the adaptive elastic net

TL;DR: The proposed stochastic matching pursuit algorithm is designed for sampling from the posterior distribution of the coefficients for the purpose of variable selection in linear regression and is considered a modification of the componentwise Gibbs sampler.

...read moreread less

Abstract: This article proposes a stochastic version of the matching pursuit algorithm for Bayesian variable selection in linear regression. In the Bayesian formulation, the prior distribution of each regression coefficient is assumed to be a mixture of a point mass at 0 and a normal distribution with zero mean and a large variance. The proposed stochastic matching pursuit algorithm is designed for sampling from the posterior distribution of the coefficients for the purpose of variable selection. The proposed algorithm can be considered a modification of the componentwise Gibbs sampler. In the componentwise Gibbs sampler, the variables are visited by a random or a systematic scan. In the stochastic matching pursuit algorithm, the variables that better align with the current residual vector are given higher probabilities of being visited. The proposed algorithm combines the efficiency of the matching pursuit algorithm and the Bayesian formulation with well defined prior distributions on coefficients. Several simulated examples of small n and large p are used to illustrate the algorithm. These examples show that the algorithm is efficient for screening and selecting variables.

...read moreread less

Journal Article•DOI•

[...]

Samiran Ghosh¹•Institutions (1)

Indiana University – Purdue University Indianapolis¹

A two-sample empirical likelihood ratio test based on samples entropy

TL;DR: This article focuses on the grouped selection property of adaptive elastic net along with its model selection complexity and sheds some light on the bias-variance tradeoff of different regularization methods including adaptive elasticNet.

...read moreread less

Abstract: Lasso proved to be an extremely successful technique for simultaneous estimation and variable selection. However lasso has two major drawbacks. First, it does not enforce any grouping effect and secondly in some situation lasso solutions are inconsistent for variable selection. To overcome this inconsistency adaptive lasso is proposed where adaptive weights are used for penalizing different coefficients. Recently a doubly regularized technique namely elastic net is proposed which encourages grouping effect i.e. either selection or omission of the correlated variables together. However elastic net is also inconsistent. In this paper we study adaptive elastic net which does not have this drawback. In this article we specially focus on the grouped selection property of adaptive elastic net along with its model selection complexity. We also shed some light on the bias-variance tradeoff of different regularization methods including adaptive elastic net. An efficient algorithm was proposed in the line of LARS-EN, which is then illustrated with simulated as well as real life data examples.

...read moreread less

Journal Article•DOI•

[...]

Gregory Gurevich¹, Albert Vexler²•Institutions (2)

Sami Shamoon College of Engineering¹, State University of New York System²

Multiscale interpretation of taut string estimation and its connection to Unbalanced Haar wavelets

TL;DR: The proposed and examined distribution-free two- sample test is shown to be very competitive with well-known nonparametric tests, and has high and stable power detecting a nonconstant shift in the two-sample problem, when Wilcoxon’s test may break down completely.

...read moreread less

Abstract: Powerful entropy-based tests for normality, uniformity and exponentiality have been well addressed in the statistical literature. The density-based empirical likelihood approach improves the performance of these tests for goodness-of-fit, forming them into approximate likelihood ratios. This method is extended to develop two-sample empirical likelihood approximations to optimal parametric likelihood ratios, resulting in an efficient test based on samples entropy. The proposed and examined distribution-free two-sample test is shown to be very competitive with well-known nonparametric tests. For example, the new test has high and stable power detecting a nonconstant shift in the two-sample problem, when Wilcoxon's test may break down completely. This is partly due to the inherent structure developed within Neyman-Pearson type lemmas. The outputs of an extensive Monte Carlo analysis and real data example support our theoretical results. The Monte Carlo simulation study indicates that the proposed test compares favorably with the standard procedures, for a wide range of null and alternative distributions.

...read moreread less

Journal Article•DOI•

[...]

Haeran Cho¹, Piotr Fryzlewicz¹•Institutions (1)

London School of Economics and Political Science¹

Bayesian fractional polynomials

TL;DR: In this article, the authors compare two state-of-the-art nonlinear techniques for nonparametric function estimation via piecewise constant approximation: the taut string and the Unbalanced Haar methods.

...read moreread less

Abstract: We compare two state-of-the-art non-linear techniques for nonparametric function estimation via piecewise constant approximation: the taut string and the Unbalanced Haar methods. While it is well-known that the latter is multiscale, it is not obvious that the former can also be interpreted as multiscale. We provide a unified multiscale representation for both methods, which offers an insight into the relationship between them as well as suggesting lessons both methods can learn from each other.

...read moreread less

Journal Article•DOI•

[...]

Daniel Sabanés Bové¹, Leonhard Held¹•Institutions (1)

University of Zurich¹

A permutation test for umbrella alternatives

TL;DR: The methodology is based on a Bayesian linear model with a quasi-default hyper-g prior and combines variable selection with parametric modelling of additive effects and a Markov chain Monte Carlo algorithm for the exploration of the model space is presented.

...read moreread less

Abstract: This paper sets out to implement the Bayesian paradigm for fractional polynomial models under the assumption of normally distributed error terms. Fractional polynomials widen the class of ordinary polynomials and offer an additive and transportable modelling approach. The methodology is based on a Bayesian linear model with a quasi-default hyper-g prior and combines variable selection with parametric modelling of additive effects. A Markov chain Monte Carlo algorithm for the exploration of the model space is presented. This theoretically well-founded stochastic search constitutes a substantial improvement to ad hoc stepwise procedures for the fitting of fractional polynomial models. The method is applied to a data set on the relationship between ozone levels and meteorological parameters, previously analysed in the literature.

...read moreread less

Journal Article•DOI•

[...]

Dario Basso¹, Luigi Salmaso¹•Institutions (1)

University of Padua¹

01 Jan 2011-Statistics and Computing

TL;DR: The proposed procedure is very flexible and can be extended to trend and/or repeated measure problems, and some comparisons through simulations and examples with the well known Mack & Wolfe test for umbrella alternative and with Page’s test for trend problems with correlated data are investigated.

...read moreread less

Abstract: There is a wide variety of stochastic ordering problems where K groups (typically ordered with respect to time) are observed along with a (continuous) response. The interest of the study may be on finding the change-point group, i.e. the group where an inversion of trend of the variable under study is observed. A change point is not merely a maximum (or a minimum) of the time-series function, but a further requirement is that the trend of the time-series is monotonically increasing before that point, and monotonically decreasing afterwards. A suitable solution can be provided within a conditional approach, i.e. by considering some suitable nonparametric combination of dependent tests for simple stochastic ordering problems. The proposed procedure is very flexible and can be extended to trend and/or repeated measure problems. Some comparisons through simulations and examples with the well known Mack & Wolfe test for umbrella alternative and with Page's test for trend problems with correlated data are investigated.

...read moreread less

Journal Article•DOI•

Exact distributional computations for Roy's statistic and the largest eigenvalue of a Wishart distribution

[...]

Ronald W. Butler¹, Robert L. Paige²•Institutions (2)

Southern Methodist University¹, Missouri University of Science and Technology²

Classification using distance nearest neighbours

TL;DR: Computational expressions for the exact CDF of Roy’s test statistic in MANOVA and the largest eigenvalue of a Wishart matrix are derived based upon their Pfaffian representations given in Gupta and Richards.

...read moreread less

Abstract: Computational expressions for the exact CDF of Roy's test statistic in MANOVA and the largest eigenvalue of a Wishart matrix are derived based upon their Pfaffian representations given in Gupta and Richards (SIAM J. Math. Anal. 16:852---858, 1985). These expressions allow computations to proceed until a prespecified degree of accuracy is achieved. For both distributions, convergence acceleration methods are used to compute CDF values which achieve reasonably fast run times for dimensions up to 50 and error degrees of freedom as large as 100. Software that implements these computations is described and has been made available on the Web.

...read moreread less

Journal Article•DOI•

[...]

Nial Friel¹, Anthony N. Pettitt²•Institutions (2)

University College Dublin¹, Queensland University of Technology²