Showing papers by "Rina Foygel Barber published in 2016"

PDF

Open Access

Journal Article•DOI•

An algorithm for constrained one-step inversion of spectral CT data.

[...]

Rina Foygel Barber¹, Emil Y. Sidky¹, Taly Gilat Schmidt², Xiaochuan Pan¹•Institutions (2)

University of Chicago¹, Marquette University²

19 Apr 2016-Physics in Medicine and Biology

TL;DR: In this paper, a primal-dual algorithm is proposed for one-step inversion of spectral CT transmission photon counts data to a basis map decomposition. But the algorithm requires the spectral CT data discrepancy terms to be non-convex.

...read moreread less

Abstract: We develop a primal-dual algorithm that allows for one-step inversion of spectral CT transmission photon counts data to a basis map decomposition. The algorithm allows for image constraints to be enforced on the basis maps during the inversion. The derivation of the algorithm makes use of a local upper bounding quadratic approximation to generate descent steps for non-convex spectral CT data discrepancy terms, combined with a new convex-concave optimization algorithm. Convergence of the algorithm is demonstrated on simulated spectral CT data. Simulations with noise and anthropomorphic phantoms show examples of how to employ the constrained one-step algorithm for spectral CT data.

...read moreread less

106 citations

Posted Content•

Multiple testing with the structure adaptive Benjamini-Hochberg algorithm

[...]

Ang Li¹, Rina Foygel Barber¹•Institutions (1)

University of Chicago¹

25 Jun 2016-arXiv: Methodology

TL;DR: The main theoretical result proves that the SABHA method controls the FDR at a level that is at most slightly higher than the target FDR level, as long as the adaptive weights are constrained sufficiently so as not to overfit too much to the data.

...read moreread less

Abstract: In multiple testing problems, where a large number of hypotheses are tested simultaneously, false discovery rate (FDR) control can be achieved with the well-known Benjamini-Hochberg procedure, which adapts to the amount of signal present in the data. Many modifications of this procedure have been proposed to improve power in scenarios where the hypotheses are organized into groups or into a hierarchy, as well as other structured settings. Here we introduce SABHA, the "structure-adaptive Benjamini-Hochberg algorithm", as a generalization of these adaptive testing methods. SABHA incorporates prior information about any pre-determined type of structure in the pattern of locations of the signals and nulls within the list of hypotheses, to reweight the p-values in a data-adaptive way. This raises the power by making more discoveries in regions where signals appear to be more common. Our main theoretical result proves that SABHA controls FDR at a level that is at most slightly higher than the target FDR level, as long as the adaptive weights are constrained sufficiently so as not to overfit too much to the data-interestingly, the excess FDR can be related to the Rademacher complexity or Gaussian width of the class from which we choose our data-adaptive weights. We apply this general framework to various structured settings, including ordered, grouped, and low total variation structures, and get the bounds on FDR for each specific setting. We also examine the empirical performance of SABHA on fMRI activity data and on gene/drug response data, as well as on simulated data.

...read moreread less

91 citations

Posted Content•

A knockoff filter for high-dimensional selective inference

[...]

Rina Foygel Barber, Emmanuel J. Candès

10 Feb 2016-arXiv: Methodology

TL;DR: In this article, the authors developed a framework for testing for associations in a possibly high-dimensional linear model where the number of features/variables may far exceed the total number of observational units.

...read moreread less

Abstract: This paper develops a framework for testing for associations in a possibly high-dimensional linear model where the number of features/variables may far exceed the number of observational units. In this framework, the observations are split into two groups, where the first group is used to screen for a set of potentially relevant variables, whereas the second is used for inference over this reduced set of variables; we also develop strategies for leveraging information from the first part of the data at the inference step for greater power. In our work, the inferential step is carried out by applying the recently introduced knockoff filter, which creates a knockoff copy-a fake variable serving as a control-for each screened variable. We prove that this procedure controls the directional false discovery rate (FDR) in the reduced model controlling for all screened variables; this says that our high-dimensional knockoff procedure 'discovers' important variables as well as the directions (signs) of their effects, in such a way that the expected proportion of wrongly chosen signs is below the user-specified level (thereby controlling a notion of Type S error averaged over the selected set). This result is non-asymptotic, and holds for any distribution of the original features and any values of the unknown regression coefficients, so that inference is not calibrated under hypothesized values of the effect sizes. We demonstrate the performance of our general and flexible approach through numerical studies, showing more power than existing alternatives. Finally, we apply our method to a genome-wide association study to find locations on the genome that are possibly associated with a continuous phenotype.

...read moreread less

74 citations

Proceedings Article•

Selective inference for group-sparse linear models

[...]

Fan Yang, Rina Foygel Barber¹, Prateek Jain², John Lafferty¹•Institutions (2)

University of Chicago¹, Microsoft²

27 Jul 2016

TL;DR: The main technical result gives the precise distribution of the magnitude of the projection of the data onto a given subspace, and enables the development of inference procedures for a broad class of group-sparse selection methods, including the group lasso, iterative hard thresholding, and forward stepwise regression.

...read moreread less

Abstract: We develop tools for selective inference in the setting of group sparsity, including the construction of confidence intervals and p-values for testing selected groups of variables. Our main technical result gives the precise distribution of the magnitude of the projection of the data onto a given subspace, and enables us to develop inference procedures for a broad class of group-sparse selection methods, including the group lasso, iterative hard thresholding, and forward stepwise regression. We give numerical results to illustrate these tools on simulated data and on health record data.

...read moreread less

43 citations

Journal Article•DOI•

Inferring skeletal production from time-averaged assemblages: skeletal loss pulls the timing of production pulses towards the modern period

[...]

Adam Tomašových, Susan M. Kidwell¹, Rina Foygel Barber¹•Institutions (1)

University of Chicago¹

01 Feb 2016-Paleobiology

TL;DR: This work evaluates the joint effects of temporally variable production and skeletal loss on postmortem age-frequency distributions to determine how to detect fluctuations in production over the recent past from AFDs and shows that, relative to the true timing of past production pulses, the modes of AFDs will be shifted to younger age cohorts, causing the true age of past pulses to be underestimated.

...read moreread less

Abstract: Age-frequency distributions of dead skeletal material on the landscape or seabed—information on the time that has elapsed since the death of individuals—provide decadal- to millennial-scale perspectives both on the history of production and on the processes that lead to skeletal disintegration and burial. So far, however, models quantifying the dynamics of skeletal loss have assumed that skeletal production is constant during time-averaged accumulation. Here, to improve inferences in conservation paleobiology and historical ecology, we evaluate the joint effects of temporally variable production and skeletal loss on postmortem age-frequency distributions (AFDs) to determine how to detect fluctuations in production over the recent past from AFDs. We show that, relative to the true timing of past production pulses, the modes of AFDs will be shifted to younger age cohorts, causing the true age of past pulses to be underestimated. This shift in the apparent timing of a past pulse in production will be stronger where loss rates are high and/or the rate of decline in production is slow; also, a single pulse coupled with a declining loss rate can, under some circumstances, generate a bimodal distribution. We apply these models to death assemblages of the bivalve Nuculana taphria from the Southern California continental shelf, finding that: (1) an onshore-offshore gradient in time averaging is dominated by a gradient in the timing of production, reflecting the tracking of shallow-water habitats under a sea-level rise, rather than by a gradient in disintegration and sequestration rates, which remain constant with water depth; and (2) loss-corrected model-based estimates of the timing of past production are in good agreement with likely past changes in local production based on an independent sea-level curve.

...read moreread less

39 citations

Posted Content•

The knockoff filter for FDR control in group-sparse and multitask regression

[...]

Ran Dai¹, Rina Foygel Barber¹•Institutions (1)

University of Chicago¹

11 Feb 2016-arXiv: Methodology

TL;DR: The group knockoff filter is proposed, a method for false discovery rate control in a linear regression setting where the features are grouped, and a set of relevant groups which have a nonzero effect on the response are selected.

...read moreread less

Abstract: We propose the group knockoff filter, a method for false discovery rate control in a linear regression setting where the features are grouped, and we would like to select a set of relevant groups which have a nonzero effect on the response. By considering the set of true and false discoveries at the group level, this method gains power relative to sparse regression methods. We also apply our method to the multitask regression problem where multiple response variables share similar sparsity patterns across the set of possible features. Empirically, the group knockoff filter successfully controls false discoveries at the group level in both settings, with substantially more discoveries made by leveraging the group structure.

...read moreread less

37 citations

Posted Content•

The Function-on-Scalar LASSO with Applications to Longitudinal GWAS

[...]

Rina Foygel Barber¹, Matthew Reimherr, Thomas Schill²•Institutions (2)

University of Chicago¹, Pennsylvania State University²

24 Oct 2016-arXiv: Statistics Theory

TL;DR: Using the Framingham Heart Study, it is demonstrated how the LASSO tools can be used in genome-wide association studies, finding a number of genetic mutations which affect blood pressure and are therefore important for cardiovascular health.

...read moreread less

Abstract: We present a new methodology for simultaneous variable selection and parameter estimation in function-on-scalar regression with an ultra-high dimensional predictor vector. We extend the LASSO to functional data in both the $\textit{dense}$ functional setting and the $\textit{sparse}$ functional setting. We provide theoretical guarantees which allow for an exponential number of predictor variables. Simulations are carried out which illustrate the methodology and compare the sparse/functional methods. Using the Framingham Heart Study, we demonstrate how our tools can be used in genome-wide association studies, finding a number of genetic mutations which affect blood pressure and are therefore important for cardiovascular health.

...read moreread less

31 citations

Posted Content•

Selective Inference for Group-Sparse Linear Models

[...]

Fan Yang, Rina Foygel Barber¹, Prateek Jain², John Lafferty¹•Institutions (2)

University of Chicago¹, Microsoft²

27 Jul 2016-arXiv: Methodology

TL;DR: The authors developed tools for selective inference in the setting of group sparsity, including the construction of confidence intervals and p-values for testing selected groups of variables, including group lasso, iterative hard thresholding, and forward stepwise regression.

...read moreread less

30 citations

Proceedings Article•

The knockoff filter for FDR control in group-sparse and multitask regression

[...]

Ran Dai¹, Rina Foygel Barber¹•Institutions (1)

University of Chicago¹

19 Jun 2016

TL;DR: Group knockoff filter as mentioned in this paper selects a set of relevant groups which have a nonzero effect on the response, by considering the set of true and false discoveries at the group level, this method gains power relative to sparse regression methods.

...read moreread less

19 citations

Posted Content•

Trimmed Conformal Prediction for High-Dimensional Models

[...]

Wenyu Chen, Zhaokai Wang, Wooseok Ha, Rina Foygel Barber

29 Nov 2016-arXiv: Statistics Theory

TL;DR: This paper proposes a new framework, called Trimmed Conformal Prediction (TCP), based on two stage procedure, a trimming step and a prediction step, which can be applied to any regression method, and further offers both statistical accuracy and computational gains.

...read moreread less

Abstract: In regression, conformal prediction is a general methodology to construct prediction intervals in a distribution-free manner. Although conformal prediction guarantees strong statistical property for predictive inference, its inherent computational challenge has attracted the attention of researchers in the community. In this paper, we propose a new framework, called Trimmed Conformal Prediction (TCP), based on two stage procedure, a trimming step and a prediction step. The idea is to use a preliminary trimming step to substantially reduce the range of possible values for the prediction interval, and then applying conformal prediction becomes far more efficient. As is the case of conformal prediction, TCP can be applied to any regression method, and further offers both statistical accuracy and computational gains. For a specific example, we also show how TCP can be implemented in the sparse regression setting. The experiments on both synthetic and real data validate the empirical performance of TCP.

...read moreread less

15 citations

Journal Article•

MOCCA: mirrored convex/concave optimization for nonconvex composite functions

[...]

Rina Foygel Barber¹, Emil Y. Sidky¹•Institutions (1)

University of Chicago¹

01 Jan 2016-Journal of Machine Learning Research

TL;DR: The MOCCA (mirrored convex/concave) algorithm is proposed, a primal/dual optimization approach that takes a local convex approximation to each term at every iteration, and offers theoretical guarantees for convergence when the overall problem is approximately convex.

...read moreread less

Abstract: Many optimization problems arising in high-dimensional statistics decompose naturally into a sum of several terms, where the individual terms are relatively simple but the composite objective function can only be optimized with iterative algorithms. In this paper, we are interested in optimization problems of the form F(Kx) + G(x), where K is a fixed linear transformation, while F and G are functions that may be nonconvex and/or nondifferentiable. In particular, if either of the terms are nonconvex, existing alternating minimization techniques may fail to converge; other types of existing approaches may instead be unable to handle nondifferentiability. We propose the mocca (mirrored convex/concave) algorithm, a primal/dual optimization approach that takes a local convex approximation to each term at every iteration. Inspired by optimization problems arising in computed tomography (CT) imaging, this algorithm can handle a range of nonconvex composite optimization problems, and offers theoretical guarantees for convergence when the overall problem is approximately convex (that is, any concavity in one term is balanced out by convexity in the other term). Empirical results show fast convergence for several structured signal recovery problems.

...read moreread less

Book Chapter•DOI•

Laplace Approximation in High-Dimensional Bayesian Regression

[...]

Rina Foygel Barber¹, Mathias Drton², Kean Ming Tan²•Institutions (2)

University of Chicago¹, University of Washington²

01 Jan 2016

TL;DR: This work considers Bayesian variable selection in sparse high-dimensional regression, where the number of covariates p may be large relative to the sample size n, but at most a moderate number q of covariate are active, and treats generalized linear models.

...read moreread less

Abstract: We consider Bayesian variable selection in sparse high-dimensional regression, where the number of covariates p may be large relative to the sample size n, but at most a moderate number q of covariates are active. Specifically, we treat generalized linear models. For a single fixed sparse model with well-behaved prior distribution, classical theory proves that the Laplace approximation to the marginal likelihood of the model is accurate for sufficiently large sample size n. We extend this theory by giving results on uniform accuracy of the Laplace approximation across all models in a high-dimensional scenario in which p and q, and thus also the number of considered models, may increase with n. Moreover, we show how this connection between marginal likelihood and Laplace approximation can be used to obtain consistency results for Bayesian approaches to variable selection in high-dimensional regression.

...read moreread less