Showing papers in "arXiv: Methodology in 2010"

PDF

Open Access

Posted Content•

Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain

[...]

Alexandre Belloni, Daniel L. Chen, Victor Chernozhukov, Christian Hansen

21 Oct 2010-arXiv: Methodology

TL;DR: A fully data-driven method for choosing the user-specified penalty that must be provided in obtaining LASSO and Post-LASSO estimates is provided and its asymptotic validity under non-Gaussian, heteroscedastic disturbances is established.

...read moreread less

Abstract: We develop results for the use of Lasso and Post-Lasso methods to form first-stage predictions and estimate optimal instruments in linear instrumental variables (IV) models with many instruments, $p$. Our results apply even when $p$ is much larger than the sample size, $n$. We show that the IV estimator based on using Lasso or Post-Lasso in the first stage is root-n consistent and asymptotically normal when the first-stage is approximately sparse; i.e. when the conditional expectation of the endogenous variables given the instruments can be well-approximated by a relatively small set of variables whose identities may be unknown. We also show the estimator is semi-parametrically efficient when the structural error is homoscedastic. Notably our results allow for imperfect model selection, and do not rely upon the unrealistic "beta-min" conditions that are widely used to establish validity of inference following model selection. In simulation experiments, the Lasso-based IV estimator with a data-driven penalty performs well compared to recently advocated many-instrument-robust procedures. In an empirical example dealing with the effect of judicial eminent domain decisions on economic outcomes, the Lasso-based IV estimator outperforms an intuitive benchmark. In developing the IV results, we establish a series of new results for Lasso and Post-Lasso estimators of nonparametric conditional expectation functions which are of independent theoretical and practical interest. We construct a modification of Lasso designed to deal with non-Gaussian, heteroscedastic disturbances which uses a data-weighted $\ell_1$-penalty function. Using moderate deviation theory for self-normalized sums, we provide convergence rates for the resulting Lasso and Post-Lasso estimators that are as sharp as the corresponding rates in the homoscedastic Gaussian case under the condition that $\log p = o(n^{1/3})$.

...read moreread less

495 citations

Journal Article•DOI•

The Impact of Levene's Test of Equality of Variances on Statistical Theory and Practice

[...]

Joseph L. Gastwirth¹, Yulia R. Gel², Weiwen Miao³•Institutions (3)

George Washington University¹, University of Waterloo², Haverford College³

02 Oct 2010-arXiv: Methodology

TL;DR: In this article, a modification of Levene-type tests to increase their power to detect monotonic trends in variances is discussed, which is useful when one is concerned with an alternative of increasing or decreasing variability, for example, increasing volatility of stocks prices or "open or closed gramophones" in regression residual analysis.

...read moreread less

Abstract: In many applications, the underlying scientific question concerns whether the variances of $k$ samples are equal. There are a substantial number of tests for this problem. Many of them rely on the assumption of normality and are not robust to its violation. In 1960 Professor Howard Levene proposed a new approach to this problem by applying the $F$-test to the absolute deviations of the observations from their group means. Levene's approach is powerful and robust to nonnormality and became a very popular tool for checking the homogeneity of variances. This paper reviews the original method proposed by Levene and subsequent robust modifications. A modification of Levene-type tests to increase their power to detect monotonic trends in variances is discussed. This procedure is useful when one is concerned with an alternative of increasing or decreasing variability, for example, increasing volatility of stocks prices or "open or closed gramophones" in regression residual analysis. A major section of the paper is devoted to discussion of various scientific problems where Levene-type tests have been used, for example, economic anthropology, accuracy of medical measurements, volatility of the price of oil, studies of the consistency of jury awards in legal cases and the effect of hurricanes on ecological systems.

...read moreread less

355 citations

Journal Article•DOI•

Robust rank correlation based screening

[...]

Gaorong Li¹, Heng Peng, Jun Zhang², Lixing Zhu³•Institutions (3)

Beijing University of Technology¹, Shenzhen University², Hong Kong Baptist University³

20 Dec 2010-arXiv: Methodology

TL;DR: In this article, a robust rank correlation screening (RRCS) method is proposed to deal with ultra-high dimensional data, which is based on the Kendall \tau correlation coefficient between response and predictor variables rather than the Pearson correlation.

...read moreread less

Abstract: Independence screening is a variable selection method that uses a ranking criterion to select significant variables, particularly for statistical models with nonpolynomial dimensionality or "large p, small n" paradigms when p can be as large as an exponential of the sample size n. In this paper we propose a robust rank correlation screening (RRCS) method to deal with ultra-high dimensional data. The new procedure is based on the Kendall \tau correlation coefficient between response and predictor variables rather than the Pearson correlation of existing methods. The new method has four desirable features compared with existing independence screening methods. First, the sure independence screening property can hold only under the existence of a second order moment of predictor variables, rather than exponential tails or alikeness, even when the number of predictor variables grows as fast as exponentially of the sample size. Second, it can be used to deal with semiparametric models such as transformation regression models and single-index models under monotonic constraint to the link function without involving nonparametric estimation even when there are nonparametric functions in the models. Third, the procedure can be largely used against outliers and influence points in the observations. Last, the use of indicator functions in rank correlation screening greatly simplifies the theoretical derivation due to the boundedness of the resulting statistics, compared with previous studies on variable screening. Simulations are carried out for comparisons with existing methods and a real data example is analyzed.

...read moreread less

265 citations

Posted Content•

Outlier Detection Using Nonconvex Penalized Regression

[...]

Yiyuan She¹, Art B. Owen¹•Institutions (1)

Florida State University¹

14 Jun 2010-arXiv: Methodology

TL;DR: A thresholding based iterative procedure for outlier detection (Θ–IPOD) based on hard thresholding correctly identifies outliers on some hard test problems and is much faster than iteratively reweighted least squares for large data, because each iteration costs at most O(np) (and sometimes much less), avoiding an O( np2) least squares estimate.

...read moreread less

Abstract: This paper studies the outlier detection problem from the point of view of penalized regressions. Our regression model adds one mean shift parameter for each of the $n$ data points. We then apply a regularization favoring a sparse vector of mean shift parameters. The usual $L_1$ penalty yields a convex criterion, but we find that it fails to deliver a robust estimator. The $L_1$ penalty corresponds to soft thresholding. We introduce a thresholding (denoted by $\Theta$) based iterative procedure for outlier detection ($\Theta$-IPOD). A version based on hard thresholding correctly identifies outliers on some hard test problems. We find that $\Theta$-IPOD is much faster than iteratively reweighted least squares for large data because each iteration costs at most $O(np)$ (and sometimes much less) avoiding an $O(np^2)$ least squares estimate. We describe the connection between $\Theta$-IPOD and $M$-estimators. Our proposed method has one tuning parameter with which to both identify outliers and estimate regression coefficients. A data-dependent choice can be made based on BIC. The tuned $\Theta$-IPOD shows outstanding performance in identifying outliers in various situations in comparison to other existing approaches. This methodology extends to high-dimensional modeling with $p\gg n$, if both the coefficient vector and the outlier pattern are sparse.

...read moreread less

230 citations

Journal Article•DOI•

Identification, Inference and Sensitivity Analysis for Causal Mediation Effects

[...]

Kosuke Imai¹, Luke Keele², Teppei Yamamoto¹•Institutions (2)

Princeton University¹, Ohio State University²

04 Nov 2010-arXiv: Methodology

TL;DR: In this paper, the average causal mediation effect (ACME) is nonparametrically identified under a particular version of sequential ignorability assumption, and sensitivity analysis is proposed to examine the robustness of empirical findings to the possible existence of an unmeasured confounder.

...read moreread less

Abstract: Causal mediation analysis is routinely conducted by applied researchers in a variety of disciplines. The goal of such an analysis is to investigate alternative causal mechanisms by examining the roles of intermediate variables that lie in the causal paths between the treatment and outcome variables. In this paper we first prove that under a particular version of sequential ignorability assumption, the average causal mediation effect (ACME) is nonparametrically identified. We compare our identification assumption with those proposed in the literature. Some practical implications of our identification result are also discussed. In particular, the popular estimator based on the linear structural equation model (LSEM) can be interpreted as an ACME estimator once additional parametric assumptions are made. We show that these assumptions can easily be relaxed within and outside of the LSEM framework and propose simple nonparametric estimation strategies. Second, and perhaps most importantly, we propose a new sensitivity analysis that can be easily implemented by applied researchers within the LSEM framework. Like the existing identifying assumptions, the proposed sequential ignorability assumption may be too strong in many applied settings. Thus, sensitivity analysis is essential in order to examine the robustness of empirical findings to the possible existence of an unmeasured confounder. Finally, we apply the proposed methods to a randomized experiment from political psychology. We also make easy-to-use software available to implement the proposed methods.

...read moreread less

216 citations

Journal Article•DOI•

The Spread of Evidence-Poor Medicine via Flawed Social-Network Analysis

[...]

Russell Lyons¹•Institutions (1)

Indiana University¹

16 Jul 2010-arXiv: Methodology

TL;DR: There are cautionary examples in a series of recent papers by Christakis and Fowler that advance statistical arguments for the transmission via social networks of various personal characteristics, including obesity, smoking cessation, happiness, and loneliness that assert that such influence extends to three degrees of separation in social networks.

...read moreread less

Abstract: The chronic widespread misuse of statistics is usually inadvertent, not intentional. We find cautionary examples in a series of recent papers by Christakis and Fowler that advance statistical arguments for the transmission via social networks of various personal characteristics, including obesity, smoking cessation, happiness, and loneliness. Those papers also assert that such influence extends to three degrees of separation in social networks. We shall show that these conclusions do not follow from Christakis and Fowler's statistical analyses. In fact, their studies even provide some evidence against the existence of such transmission. The errors that we expose arose, in part, because the assumptions behind the statistical procedures used were insufficiently examined, not only by the authors, but also by the reviewers. Our examples are instructive because the practitioners are highly reputed, their results have received enormous popular attention, and the journals that published their studies are among the most respected in the world. An educational bonus emerges from the difficulty we report in getting our critique published. We discuss the relevance of this episode to understanding statistical literacy and the role of scientific review, as well as to reforming statistics education.

...read moreread less

169 citations

Journal Article•DOI•

Point process modeling for directed interaction networks

[...]

Patrick O. Perry, Patrick J. Wolfe

08 Nov 2010-arXiv: Methodology

TL;DR: A model is introduced for treating directed interactions as a multivariate point process: a Cox multiplicative intensity model using covariates that depend on the history of the process, and consistency and asymptotic normality are proved.

...read moreread less

Abstract: Network data often take the form of repeated interactions between senders and receivers tabulated over time. A primary question to ask of such data is which traits and behaviors are predictive of interaction. To answer this question, a model is introduced for treating directed interactions as a multivariate point process: a Cox multiplicative intensity model using covariates that depend on the history of the process. Consistency and asymptotic normality are proved for the resulting partial-likelihood-based estimators under suitable regularity conditions, and an efficient fitting procedure is described. Multicast interactions--those involving a single sender but multiple receivers--are treated explicitly. The resulting inferential framework is then employed to model message sending behavior in a corporate e-mail network. The analysis gives a precise quantification of which static shared traits and dynamic network effects are predictive of message recipient selection.

...read moreread less

157 citations

Posted Content•

Estimating False Discovery Proportion Under Arbitrary Covariance Dependence

[...]

Jianqing Fan¹, Xu Han², Weijie Gu³•Institutions (3)

Shanghai University of Finance and Economics¹, University of Florida², Princeton University³

28 Oct 2010-arXiv: Methodology

TL;DR: An approximate expression for false discovery proportion (FDP) in large-scale multiple testing when a common threshold is used and a consistent estimate of realized FDP is provided, which has important applications in controlling false discovery rate and FDP.

...read moreread less

Abstract: Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any SNPs are associated with some traits and those tests are correlated. When test statistics are correlated, false discovery control becomes very challenging under arbitrary dependence. In the current paper, we propose a novel method based on principal factor approximation, which successfully subtracts the common dependence and weakens significantly the correlation structure, to deal with an arbitrary dependence structure. We derive an approximate expression for false discovery proportion (FDP) in large scale multiple testing when a common threshold is used and provide a consistent estimate of realized FDP. This result has important applications in controlling FDR and FDP. Our estimate of realized FDP compares favorably with Efron (2007)'s approach, as demonstrated in the simulated examples. Our approach is further illustrated by some real data applications. We also propose a dependence-adjusted procedure, which is more powerful than the fixed threshold procedure.

...read moreread less

152 citations

Journal Article•DOI•

Estimation for High-Dimensional Linear Mixed-Effects Models Using $\ell_1$-Penalization

[...]

Jürg Schelldorfer, Peter Bühlmann, Sara van de Geer

19 Feb 2010-arXiv: Methodology

TL;DR: In this article, the authors propose an approximation procedure for high-dimensional linear mixed-effects models with a group structure, and prove a consistency and an oracle optimality result.

...read moreread less

Abstract: We propose an $\ell_1$-penalized estimation procedure for high-dimensional linear mixed-effects models. The models are useful whenever there is a grouping structure among high-dimensional observations, i.e. for clustered data. We prove a consistency and an oracle optimality result and we develop an algorithm with provable numerical convergence. Furthermore, we demonstrate the performance of the method on simulated and a real high-dimensional data set.

...read moreread less

149 citations

Journal Article•DOI•

Estimating Effects and Making Predictions from Genome-Wide Marker Data

[...]

Michael E. Goddard, Naomi R. Wray, Klara L. Verbyla¹, Peter M. Visscher¹•Institutions (1)

QIMR Berghofer Medical Research Institute¹

22 Oct 2010-arXiv: Methodology

TL;DR: In this article, an integrated approach to the estimation of the SNP effects and to the prediction of trait values, treating SNP effects as random instead of fixed effects, is proposed, which is a property of the estimator.

...read moreread less

Abstract: In genome-wide association studies (GWAS), hundreds of thousands of genetic markers (SNPs) are tested for association with a trait or phenotype. Reported effects tend to be larger in magnitude than the true effects of these markers, the so-called ``winner's curse.'' We argue that the classical definition of unbiasedness is not useful in this context and propose to use a different definition of unbiasedness that is a property of the estimator we advocate. We suggest an integrated approach to the estimation of the SNP effects and to the prediction of trait values, treating SNP effects as random instead of fixed effects. Statistical methods traditionally used in the prediction of trait values in the genetics of livestock, which predates the availability of SNP data, can be applied to analysis of GWAS, giving better estimates of the SNP effects and predictions of phenotypic and genetic values in individuals.

...read moreread less

144 citations

Posted Content•

Optimization Under Unknown Constraints

[...]

Robert B. Gramacy¹, Lee Hkh., C Holmes, M Osborne•Institutions (1)

University of Cambridge¹

22 Apr 2010-arXiv: Methodology

TL;DR: A new integrated improvement criterion is proposed to recognize that responses from inputs that violate the constraint may still be informative about the function, and thus could potentially be useful in the optimization.

...read moreread less

Abstract: Optimization of complex functions, such as the output of computer simulators, is a difficult task that has received much attention in the literature. A less studied problem is that of optimization under unknown constraints, i.e., when the simulator must be invoked both to determine the typical real-valued response and to determine if a constraint has been violated, either for physical or policy reasons. We develop a statistical approach based on Gaussian processes and Bayesian learning to both approximate the unknown function and estimate the probability of meeting the constraints. A new integrated improvement criterion is proposed to recognize that responses from inputs that violate the constraint may still be informative about the function, and thus could potentially be useful in the optimization. The new criterion is illustrated on synthetic data, and on a motivating optimization problem from health care policy.

...read moreread less

Posted Content•

Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC

[...]

Paul Fearnhead, Dennis Prangle

07 Apr 2010-arXiv: Methodology

TL;DR: This work shows how to construct appropriate summary statistics for ABC in a semi-automatic manner, and shows that optimal summary statistics are the posterior means of the parameters, while these cannot be calculated analytically.

...read moreread less

Abstract: Many modern statistical applications involve inference for complex stochastic models, where it is easy to simulate from the models, but impossible to calculate likelihoods Approximate Bayesian computation (ABC) is a method of inference for such models It replaces calculation of the likelihood by a step which involves simulating artificial data for different parameter values, and comparing summary statistics of the simulated data to summary statistics of the observed data Here we show how to construct appropriate summary statistics for ABC in a semi-automatic manner We aim for summary statistics which will enable inference about certain parameters of interest to be as accurate as possible Theoretical results show that optimal summary statistics are the posterior means of the parameters While these cannot be calculated analytically, we use an extra stage of simulation to estimate how the posterior means vary as a function of the data; and then use these estimates of our summary statistics within ABC Empirical results show that our approach is a robust method for choosing summary statistics, that can result in substantially more accurate ABC analyses than the ad-hoc choices of summary statistics proposed in the literature We also demonstrate advantages over two alternative methods of simulation-based inference

...read moreread less

Posted Content•

Likelihood-free Markov chain Monte Carlo

[...]

Scott A. Sisson, Yanan Fan

13 Jan 2010-arXiv: Methodology

TL;DR: BroBrooks, A. Gelman, G. Jones and X.-L. Meng (eds), Chapman & Hall as discussed by the authors, appeared in the MCMC handbook, 2003. But the handbook was incomplete.

...read moreread less

Abstract: To appear to MCMC handbook, S. P. Brooks, A. Gelman, G. Jones and X.-L. Meng (eds), Chapman & Hall.

...read moreread less

Posted Content•

A self-normalized approach to confidence interval construction in time series

[...]

Xiaofeng Shao¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

12 May 2010-arXiv: Methodology

TL;DR: In this paper, the authors propose a new method to construct confidence intervals for quantities associated with a stationary time series, which avoids direct estimation of the asymptotic variances.

...read moreread less

Abstract: We propose a new method to construct confidence intervals for quantities that are associated with a stationary time series, which avoids direct estimation of the asymptotic variances. Unlike the existing tuning-parameter-dependent approaches, our method has the attractive convenience of being free of choosing any user-chosen number or smoothing parameter. The interval is constructed on the basis of an asymptotically distribution-free self-normalized statistic, in which the normalizing matrix is computed using recursive estimates. Under mild conditions, we establish the theoretical validity of our method for a broad class of statistics that are functionals of the empirical distribution of fixed or growing dimension. From a practical point of view, our method is conceptually simple, easy to implement and can be readily used by the practitioner. Monte-Carlo simulations are conducted to compare the finite sample performance of the new method with those delivered by the normal approximation and the block bootstrap approach.

...read moreread less

Posted Content•

A Computationally Stable Approach to Gaussian Process Interpolation of Deterministic Computer Simulation Data

[...]

Pritam Ranjan¹, Ronald D. Haynes, Richard Karsten•Institutions (1)

Acadia University¹

05 Mar 2010-arXiv: Methodology

TL;DR: A lower bound on the nugget is proposed that minimizes the over-smoothing and an iterative regularization approach to construct a predictor that further improves the interpolation accuracy is proposed.

...read moreread less

Abstract: For many expensive deterministic computer simulators, the outputs do not have replication error and the desired metamodel (or statistical emulator) is an interpolator of the observed data. Realizations of Gaussian spatial processes (GP) are commonly used to model such simulator outputs. Fitting a GP model to $n$ data points requires the computation of the inverse and determinant of $n \times n$ correlation matrices, $R$, that are sometimes computationally unstable due to near-singularity of $R$. This happens if any pair of design points are very close together in the input space. The popular approach to overcome near-singularity is to introduce a small nugget (or jitter) parameter in the model that is estimated along with other model parameters. The inclusion of a nugget in the model often causes unnecessary over-smoothing of the data. In this paper, we propose a lower bound on the nugget that minimizes the over-smoothing and an iterative regularization approach to construct a predictor that further improves the interpolation accuracy. We also show that the proposed predictor converges to the GP interpolator.

...read moreread less

Posted Content•

Separable covariance arrays via the Tucker product, with applications to multivariate relational data

[...]

Peter D. Hoff¹•Institutions (1)

University of Washington¹

12 Aug 2010-arXiv: Methodology

TL;DR: In this article, an extension of the matrix normal model to accommodate multidimensional data arrays, or tensors, is presented. But the model is not suitable for multivariate longitudinal network data.

...read moreread less

Abstract: Modern datasets are often in the form of matrices or arrays,potentially having correlations along each set of data indices. For example, data involving repeated measurements of several variables over time may exhibit temporal correlation as well as correlation among the variables. A possible model for matrix-valued data is the class of matrix normal distributions, which is parametrized by two covariance matrices, one for each index set of the data. In this article we describe an extension of the matrix normal model to accommodate multidimensional data arrays, or tensors. We generate a class of array normal distributions by applying a group of multilinear transformations to an array of independent standard normal random variables. The covariance structures of the resulting class take the form of outer products of dimension-specific covariance matrices. We derive some properties of these covariance structures and the corresponding array normal distributions, discuss maximum likelihood and Bayesian estimation of covariance parameters and illustrate the model in an analysis of multivariate longitudinal network data.

...read moreread less

Posted Content•

Copula Processes

[...]

Andrew Gordon Wilson¹, Zoubin Ghahramani¹•Institutions (1)

University of Cambridge¹

07 Jun 2010-arXiv: Methodology

TL;DR: A stochastic volatility model, Gaussian Copula Process Volatility (GCPV), is developed, which can outperform GARCH on simulated and financial data, and incorporate covariates other than time, and model a rich class of covariance structures.

...read moreread less

Abstract: We define a copula process which describes the dependencies between arbitrarily many random variables independently of their marginal distributions. As an example, we develop a stochastic volatility model, Gaussian Copula Process Volatility (GCPV), to predict the latent standard deviations of a sequence of random variables. To make predictions we use Bayesian inference, with the Laplace approximation, and with Markov chain Monte Carlo as an alternative. We find both methods comparable. We also find our model can outperform GARCH on simulated and financial data. And unlike GARCH, GCPV can easily handle missing data, incorporate covariates other than time, and model a rich class of covariance structures.

...read moreread less

Journal Article•DOI•

Robust graphical modeling of gene networks using classical and alternative T-distributions

[...]

Michael Finegold¹, Mathias Drton¹•Institutions (1)

University of Chicago¹

19 Sep 2010-arXiv: Methodology

TL;DR: It is demonstrated that penalized likelihood inference combined with an application of the EM algorithm provides a computationally efficient approach to model selection in the $t-distribution case.

...read moreread less

Abstract: Graphical Gaussian models have proven to be useful tools for exploring network structures based on multivariate data. Applications to studies of gene expression have generated substantial interest in these models, and resulting recent progress includes the development of fitting methodology involving penalization of the likelihood function. In this paper we advocate the use of multivariate $t$-distributions for more robust inference of graphs. In particular, we demonstrate that penalized likelihood inference combined with an application of the EM algorithm provides a computationally efficient approach to model selection in the $t$-distribution case. We consider two versions of multivariate $t$-distributions, one of which requires the use of approximation techniques. For this distribution, we describe a Markov chain Monte Carlo EM algorithm based on a Gibbs sampler as well as a simple variational approximation that makes the resulting method feasible in large problems.

...read moreread less

Journal Article•DOI•

Inference and Modeling with Log-concave Distributions

[...]

Guenther Walther¹•Institutions (1)

Stanford University¹

02 Oct 2010-arXiv: Methodology

TL;DR: In this paper, a review of the literature concerning the theory and applications of log-concave distributions is presented, and the MLE can be computed with readily available algorithms.

...read moreread less

Abstract: Log-concave distributions are an attractive choice for modeling and inference, for several reasons: The class of log-concave distributions contains most of the commonly used parametric distributions and thus is a rich and flexible nonparametric class of distributions. Further, the MLE exists and can be computed with readily available algorithms. Thus, no tuning parameter, such as a bandwidth, is necessary for estimation. Due to these attractive properties, there has been considerable recent research activity concerning the theory and applications of log-concave distributions. This article gives a review of these results.

...read moreread less

Posted Content•

Reversible jump Markov chain Monte Carlo

[...]

Y Fan, S A Sisson

13 Jan 2010-arXiv: Methodology

TL;DR: BroBrooks, A. Gelman, G. Jones and X.-L. Meng (eds), Chapman & Hall as mentioned in this paper, appeared in the MCMC handbook, 2003. But the handbook was incomplete.

...read moreread less

Abstract: To appear to MCMC handbook, S. P. Brooks, A. Gelman, G. Jones and X.-L. Meng (eds), Chapman & Hall.

...read moreread less

Posted Content•

Efficient Bayesian Inference for Generalized Bradley-Terry Models

[...]

François Caron¹, Arnaud Doucet²•Institutions (2)

French Institute for Research in Computer Science and Automation¹, University of Bordeaux²

08 Nov 2010-arXiv: Methodology

TL;DR: It is shown here that iterative minorization-maximization algorithms can be reinterpreted as special instances of expectation-maximizeization algorithms associated with suitable sets of latent variables that allow for simple Gibbs samplers for Bayesian inference.

...read moreread less

Abstract: The Bradley-Terry model is a popular approach to describe probabilities of the possible outcomes when elements of a set are repeatedly compared with one another in pairs. It has found many applications including animal behaviour, chess ranking and multiclass classification. Numerous extensions of the basic model have also been proposed in the literature including models with ties, multiple comparisons, group comparisons and random graphs. From a computational point of view, Hunter (2004) has proposed efficient iterative MM (minorization-maximization) algorithms to perform maximum likelihood estimation for these generalized Bradley-Terry models whereas Bayesian inference is typically performed using MCMC (Markov chain Monte Carlo) algorithms based on tailored Metropolis-Hastings (M-H) proposals. We show here that these MM\ algorithms can be reinterpreted as special instances of Expectation-Maximization (EM) algorithms associated to suitable sets of latent variables and propose some original extensions. These latent variables allow us to derive simple Gibbs samplers for Bayesian inference. We demonstrate experimentally the efficiency of these algorithms on a variety of applications.

...read moreread less

Posted Content•

Hierarchical multilinear models for multiway data

[...]

Peter D. Hoff¹•Institutions (1)

University of Washington¹

29 May 2010-arXiv: Methodology

TL;DR: In this article, a hierarchical model-based reduced-rank decomposition is proposed to accommodate a variety of data types such as longitudinal social networks and continuous multivariate data that are cross-classified by categorical variables.

...read moreread less

Abstract: Reduced-rank decompositions provide descriptions of the variation among the elements of a matrix or array. In such decompositions, the elements of an array are expressed as products of low-dimensional latent factors. This article presents a model-based version of such a decomposition, extending the scope of reduced rank methods to accommodate a variety of data types such as longitudinal social networks and continuous multivariate data that are cross-classified by categorical variables. The proposed model-based approach is hierarchical, in that the latent factors corresponding to a given dimension of the array are not {\it a priori} independent, but exchangeable. Such a hierarchical approach allows more flexibility in the types of patterns that can be represented.

...read moreread less

Posted Content•

Regularized Least-Mean-Square Algorithms

[...]

Yilun Chen¹, Yuantao Gu, Alfred O. Hero•Institutions (1)

University of Michigan¹

22 Dec 2010-arXiv: Methodology

TL;DR: In this paper, a family of regularized Least-mean-square (LMS) algorithms for adaptive system identification with convex constraints is proposed. But the regularized LMS algorithm is not suitable for adaptive systems with sparsity assumptions on the true coefficient vector.

...read moreread less

Abstract: We consider adaptive system identification problems with convex constraints and propose a family of regularized Least-Mean-Square (LMS) algorithms. We show that with a properly selected regularization parameter the regularized LMS provably dominates its conventional counterpart in terms of mean square deviations. We establish simple and closed-form expressions for choosing this regularization parameter. For identifying an unknown sparse system we propose sparse and group-sparse LMS algorithms, which are special examples of the regularized LMS family. Simulation results demonstrate the advantages of the proposed filters in both convergence rate and steady-state error under sparsity assumptions on the true coefficient vector.

...read moreread less

Posted Content•

The Degrees of Freedom of Partial Least Squares Regression

[...]

Nicole Kraemer, Masashi Sugiyama

22 Feb 2010-arXiv: Methodology

TL;DR: In this paper, an unbiased estimate of the degree of freedom of PLS regression is proposed, defined as the trace of the first derivative of the fitted values, seen as a function of the response.

...read moreread less

Abstract: The derivation of statistical properties for Partial Least Squares regression can be a challenging task. The reason is that the construction of latent components from the predictor variables also depends on the response variable. While this typically leads to good performance and interpretable models in practice, it makes the statistical analysis more involved. In this work, we study the intrinsic complexity of Partial Least Squares Regression. Our contribution is an unbiased estimate of its Degrees of Freedom. It is defined as the trace of the first derivative of the fitted values, seen as a function of the response. We establish two equivalent representations that rely on the close connection of Partial Least Squares to matrix decompositions and Krylov subspace techniques. We show that the Degrees of Freedom depend on the collinearity of the predictor variables: The lower the collinearity is, the higher the Degrees of Freedom are. In particular, they are typically higher than the naive approach that defines the Degrees of Freedom as the number of components. Further, we illustrate how the Degrees of Freedom approach can be used for the comparison of different regression methods. In the experimental section, we show that our Degrees of Freedom estimate in combination with information criteria is useful for model selection.

...read moreread less

Journal Article•DOI•

Dempster--Shafer Theory and Statistical Inference with Weak Beliefs

[...]

Ryan Martin¹, Jianchun Zhang², Chuanhai Liu²•Institutions (2)

Indiana University – Purdue University Indianapolis¹, Purdue University²

03 Nov 2010-arXiv: Methodology

TL;DR: A general description of WB in the context of inferential models, its interplay with the DS calculus, and the maximal belief solution is presented, and new applications of the WB method in two high-dimensional hypothesis testing problems are given.

...read moreread less

Abstract: The Dempster--Shafer (DS) theory is a powerful tool for probabilistic reasoning based on a formal calculus for combining evidence. DS theory has been widely used in computer science and engineering applications, but has yet to reach the statistical mainstream, perhaps because the DS belief functions do not satisfy long-run frequency properties. Recently, two of the authors proposed an extension of DS, called the weak belief (WB) approach, that can incorporate desirable frequency properties into the DS framework by systematically enlarging the focal elements. The present paper reviews and extends this WB approach. We present a general description of WB in the context of inferential models, its interplay with the DS calculus, and the maximal belief solution. New applications of the WB method in two high-dimensional hypothesis testing problems are given. Simulations show that the WB procedures, suitably calibrated, perform well compared to popular classical methods. Most importantly, the WB approach combines the probabilistic reasoning of DS with the desirable frequency properties of classical statistics.

...read moreread less

Journal Article•DOI•

A mixed effects model for longitudinal relational and network data, with applications to international trade and conflict

[...]

Anton H. Westveld¹, Peter D. Hoff¹•Institutions (1)

University of Washington¹

08 Sep 2010-arXiv: Methodology

TL;DR: The network and temporal dependencies with a random effects model are represented with a stochastic process defined by a set of stationary covariance matrices, resulting in an intra- and inter-temporal representation of network structures.

...read moreread less

Abstract: The focus of this paper is an approach to the modeling of longitudinal social network or relational data. Such data arise from measurements on pairs of objects or actors made at regular temporal intervals, resulting in a social network for each point in time. In this article we represent the network and temporal dependencies with a random effects model, resulting in a stochastic process defined by a set of stationary covariance matrices. Our approach builds upon the social relations models of Warner, Kenny and Stoto [Journal of Personality and Social Psychology 37 (1979) 1742--1757] and Gill and Swartz [Canad. J. Statist. 29 (2001) 321--331] and allows for an intra- and inter-temporal representation of network structures. We apply the methodology to two longitudinal data sets: international trade (continuous response) and militarized interstate disputes (binary response).

...read moreread less

Posted Content•

Modeling Non-Stationary Processes Through Dimension Expansion

[...]

Luke Bornn¹, Gavin Shaddick², James V. Zidek¹•Institutions (2)

University of British Columbia¹, University of Bath²

11 Nov 2010-arXiv: Methodology

TL;DR: The proposed method works by expanding the geographic plane over which these processes evolve into higher-dimensional spaces, transforming and clarifying complex patterns in the physical plane, by combining aspects of multidimensional scaling, group lasso, and latent variable models.

...read moreread less

Abstract: In this paper, we propose a novel approach to modeling nonstationary spatial fields. The proposed method works by expanding the geographic plane over which these processes evolve into higher dimensional spaces, transforming and clarifying complex patterns in the physical plane. By combining aspects of multi-dimensional scaling, group lasso, and latent variables models, a dimensionally sparse projection is found in which the originally nonstationary field exhibits stationarity. Following a comparison with existing methods in a simulated environment, dimension expansion is studied on a classic test-bed data set historically used to study nonstationary models. Following this, we explore the use of dimension expansion in modeling air pollution in the United Kingdom, a process known to be strongly influenced by rural/urban effects, amongst others, which gives rise to a nonstationary field.

...read moreread less

Journal Article•DOI•

Dose Finding with Escalation with Overdose Control (EWOC) in Cancer Clinical Trials

[...]

Mourad Tighiouart¹, Andre Rogatko•Institutions (1)

Emory University¹

30 Nov 2010-arXiv: Methodology

TL;DR: Omitting an important predictor of toxicity when dose assignments to cancer patients are determined results in a high percent of patients experiencing severe side effects and a significant proportion treated at sub-optimal doses, as shown in the recently completed ABR-217620 (naptumomab estafenatox).

...read moreread less

Abstract: Traditionally, the major objective in phase I trials is to identify a working-dose for subsequent studies, whereas the major endpoint in phase II and III trials is treatment efficacy. The dose sought is typically referred to as the maximum tolerated dose (MTD). Several statistical methodologies have been proposed to select the MTD in cancer phase I trials. In this manuscript, we focus on a Bayesian adaptive design, known as escalation with overdose control (EWOC). Several aspects of this design are discussed, including large sample properties of the sequence of doses selected in the trial, choice of prior distributions, and use of covariates. The methodology is exemplified with real-life examples of cancer phase I trials. In particular, we show in the recently completed ABR-217620 (naptumomab estafenatox) trial that omitting an important predictor of toxicity when dose assignments to cancer patients are determined results in a high percent of patients experiencing severe side effects and a significant proportion treated at sub-optimal doses.

...read moreread less

Posted Content•

Robustness and accuracy of methods for high dimensional data analysis based on Student's t statistic

[...]

Aurore Delaigle¹, Peter Hall², Jiashun Jin³•Institutions (3)

University of Melbourne¹, University of California, Davis², Carnegie Mellon University³

22 Jan 2010-arXiv: Methodology

TL;DR: In this paper, the authors explore the features of the Student's $t$ statistic in the context of its application to very high dimensional problems, including feature selection and ranking, highly multiple hypothesis testing, and sparse, high dimensional signal detection.

...read moreread less

Abstract: Student's $t$ statistic is finding applications today that were never envisaged when it was introduced more than a century ago. Many of these applications rely on properties, for example robustness against heavy tailed sampling distributions, that were not explicitly considered until relatively recently. In this paper we explore these features of the $t$ statistic in the context of its application to very high dimensional problems, including feature selection and ranking, highly multiple hypothesis testing, and sparse, high dimensional signal detection. Robustness properties of the $t$-ratio are highlighted, and it is established that those properties are preserved under applications of the bootstrap. In particular, bootstrap methods correct for skewness, and therefore lead to second-order accuracy, even in the extreme tails. Indeed, it is shown that the bootstrap, and also the more popular but less accurate $t$-distribution and normal approximations, are more effective in the tails than towards the middle of the distribution. These properties motivate new methods, for example bootstrap-based techniques for signal detection, that confine attention to the significant tail of a statistic.

...read moreread less

Posted Content•

Dimension Reduction and Alleviation of Confounding for Spatial Generalized Linear Mixed Models

[...]

John Hughes¹, Murali Haran²•Institutions (2)

University of Minnesota¹, Pennsylvania State University²

30 Nov 2010-arXiv: Methodology

TL;DR: In this article, a new parameterization of the spatial generalized linear mixed model (SGLMM) is proposed, which alleviates spatial confounding and speeds computation by greatly reducing the dimension of spatial random effects.

...read moreread less

Abstract: Non-gaussian spatial data are very common in many disciplines. For instance, count data are common in disease mapping, and binary data are common in ecology. When fitting spatial regressions for such data, one needs to account for dependence to ensure reliable inference for the regression coefficients. The spatial generalized linear mixed model (SGLMM) offers a very popular and flexible approach to modeling such data, but the SGLMM suffers from three major shortcomings: (1) uninterpretability of parameters due to spatial confounding, (2) variance inflation due to spatial confounding, and (3) high-dimensional spatial random effects that make fully Bayesian inference for such models computationally challenging. We propose a new parameterization of the SGLMM that alleviates spatial confounding and speeds computation by greatly reducing the dimension of the spatial random effects. We illustrate the application of our approach to simulated binary, count, and Gaussian spatial datasets, and to a large infant mortality dataset.

...read moreread less

Collapse