scispace - formally typeset
Search or ask a question

Showing papers by "Rina Foygel Barber published in 2016"


Journal ArticleDOI
TL;DR: In this paper, a primal-dual algorithm is proposed for one-step inversion of spectral CT transmission photon counts data to a basis map decomposition. But the algorithm requires the spectral CT data discrepancy terms to be non-convex.
Abstract: We develop a primal-dual algorithm that allows for one-step inversion of spectral CT transmission photon counts data to a basis map decomposition. The algorithm allows for image constraints to be enforced on the basis maps during the inversion. The derivation of the algorithm makes use of a local upper bounding quadratic approximation to generate descent steps for non-convex spectral CT data discrepancy terms, combined with a new convex-concave optimization algorithm. Convergence of the algorithm is demonstrated on simulated spectral CT data. Simulations with noise and anthropomorphic phantoms show examples of how to employ the constrained one-step algorithm for spectral CT data.

106 citations


Posted Content
TL;DR: The main theoretical result proves that the SABHA method controls the FDR at a level that is at most slightly higher than the target FDR level, as long as the adaptive weights are constrained sufficiently so as not to overfit too much to the data.
Abstract: In multiple testing problems, where a large number of hypotheses are tested simultaneously, false discovery rate (FDR) control can be achieved with the well-known Benjamini-Hochberg procedure, which adapts to the amount of signal present in the data. Many modifications of this procedure have been proposed to improve power in scenarios where the hypotheses are organized into groups or into a hierarchy, as well as other structured settings. Here we introduce SABHA, the "structure-adaptive Benjamini-Hochberg algorithm", as a generalization of these adaptive testing methods. SABHA incorporates prior information about any pre-determined type of structure in the pattern of locations of the signals and nulls within the list of hypotheses, to reweight the p-values in a data-adaptive way. This raises the power by making more discoveries in regions where signals appear to be more common. Our main theoretical result proves that SABHA controls FDR at a level that is at most slightly higher than the target FDR level, as long as the adaptive weights are constrained sufficiently so as not to overfit too much to the data-interestingly, the excess FDR can be related to the Rademacher complexity or Gaussian width of the class from which we choose our data-adaptive weights. We apply this general framework to various structured settings, including ordered, grouped, and low total variation structures, and get the bounds on FDR for each specific setting. We also examine the empirical performance of SABHA on fMRI activity data and on gene/drug response data, as well as on simulated data.

91 citations


Posted Content
TL;DR: In this article, the authors developed a framework for testing for associations in a possibly high-dimensional linear model where the number of features/variables may far exceed the total number of observational units.
Abstract: This paper develops a framework for testing for associations in a possibly high-dimensional linear model where the number of features/variables may far exceed the number of observational units. In this framework, the observations are split into two groups, where the first group is used to screen for a set of potentially relevant variables, whereas the second is used for inference over this reduced set of variables; we also develop strategies for leveraging information from the first part of the data at the inference step for greater power. In our work, the inferential step is carried out by applying the recently introduced knockoff filter, which creates a knockoff copy-a fake variable serving as a control-for each screened variable. We prove that this procedure controls the directional false discovery rate (FDR) in the reduced model controlling for all screened variables; this says that our high-dimensional knockoff procedure 'discovers' important variables as well as the directions (signs) of their effects, in such a way that the expected proportion of wrongly chosen signs is below the user-specified level (thereby controlling a notion of Type S error averaged over the selected set). This result is non-asymptotic, and holds for any distribution of the original features and any values of the unknown regression coefficients, so that inference is not calibrated under hypothesized values of the effect sizes. We demonstrate the performance of our general and flexible approach through numerical studies, showing more power than existing alternatives. Finally, we apply our method to a genome-wide association study to find locations on the genome that are possibly associated with a continuous phenotype.

74 citations


Proceedings Article
27 Jul 2016
TL;DR: The main technical result gives the precise distribution of the magnitude of the projection of the data onto a given subspace, and enables the development of inference procedures for a broad class of group-sparse selection methods, including the group lasso, iterative hard thresholding, and forward stepwise regression.
Abstract: We develop tools for selective inference in the setting of group sparsity, including the construction of confidence intervals and p-values for testing selected groups of variables. Our main technical result gives the precise distribution of the magnitude of the projection of the data onto a given subspace, and enables us to develop inference procedures for a broad class of group-sparse selection methods, including the group lasso, iterative hard thresholding, and forward stepwise regression. We give numerical results to illustrate these tools on simulated data and on health record data.

43 citations


Journal ArticleDOI
TL;DR: This work evaluates the joint effects of temporally variable production and skeletal loss on postmortem age-frequency distributions to determine how to detect fluctuations in production over the recent past from AFDs and shows that, relative to the true timing of past production pulses, the modes of AFDs will be shifted to younger age cohorts, causing the true age of past pulses to be underestimated.
Abstract: Age-frequency distributions of dead skeletal material on the landscape or seabed—information on the time that has elapsed since the death of individuals—provide decadal- to millennial-scale perspectives both on the history of production and on the processes that lead to skeletal disintegration and burial. So far, however, models quantifying the dynamics of skeletal loss have assumed that skeletal production is constant during time-averaged accumulation. Here, to improve inferences in conservation paleobiology and historical ecology, we evaluate the joint effects of temporally variable production and skeletal loss on postmortem age-frequency distributions (AFDs) to determine how to detect fluctuations in production over the recent past from AFDs. We show that, relative to the true timing of past production pulses, the modes of AFDs will be shifted to younger age cohorts, causing the true age of past pulses to be underestimated. This shift in the apparent timing of a past pulse in production will be stronger where loss rates are high and/or the rate of decline in production is slow; also, a single pulse coupled with a declining loss rate can, under some circumstances, generate a bimodal distribution. We apply these models to death assemblages of the bivalve Nuculana taphria from the Southern California continental shelf, finding that: (1) an onshore-offshore gradient in time averaging is dominated by a gradient in the timing of production, reflecting the tracking of shallow-water habitats under a sea-level rise, rather than by a gradient in disintegration and sequestration rates, which remain constant with water depth; and (2) loss-corrected model-based estimates of the timing of past production are in good agreement with likely past changes in local production based on an independent sea-level curve.

39 citations


Posted Content
TL;DR: The group knockoff filter is proposed, a method for false discovery rate control in a linear regression setting where the features are grouped, and a set of relevant groups which have a nonzero effect on the response are selected.
Abstract: We propose the group knockoff filter, a method for false discovery rate control in a linear regression setting where the features are grouped, and we would like to select a set of relevant groups which have a nonzero effect on the response. By considering the set of true and false discoveries at the group level, this method gains power relative to sparse regression methods. We also apply our method to the multitask regression problem where multiple response variables share similar sparsity patterns across the set of possible features. Empirically, the group knockoff filter successfully controls false discoveries at the group level in both settings, with substantially more discoveries made by leveraging the group structure.

37 citations


Posted Content
TL;DR: Using the Framingham Heart Study, it is demonstrated how the LASSO tools can be used in genome-wide association studies, finding a number of genetic mutations which affect blood pressure and are therefore important for cardiovascular health.
Abstract: We present a new methodology for simultaneous variable selection and parameter estimation in function-on-scalar regression with an ultra-high dimensional predictor vector. We extend the LASSO to functional data in both the $\textit{dense}$ functional setting and the $\textit{sparse}$ functional setting. We provide theoretical guarantees which allow for an exponential number of predictor variables. Simulations are carried out which illustrate the methodology and compare the sparse/functional methods. Using the Framingham Heart Study, we demonstrate how our tools can be used in genome-wide association studies, finding a number of genetic mutations which affect blood pressure and are therefore important for cardiovascular health.

31 citations


Posted Content
TL;DR: The authors developed tools for selective inference in the setting of group sparsity, including the construction of confidence intervals and p-values for testing selected groups of variables, including group lasso, iterative hard thresholding, and forward stepwise regression.
Abstract: We develop tools for selective inference in the setting of group sparsity, including the construction of confidence intervals and p-values for testing selected groups of variables. Our main technical result gives the precise distribution of the magnitude of the projection of the data onto a given subspace, and enables us to develop inference procedures for a broad class of group-sparse selection methods, including the group lasso, iterative hard thresholding, and forward stepwise regression. We give numerical results to illustrate these tools on simulated data and on health record data.

30 citations


Proceedings Article
19 Jun 2016
TL;DR: Group knockoff filter as mentioned in this paper selects a set of relevant groups which have a nonzero effect on the response, by considering the set of true and false discoveries at the group level, this method gains power relative to sparse regression methods.
Abstract: We propose the group knockoff filter, a method for false discovery rate control in a linear regression setting where the features are grouped, and we would like to select a set of relevant groups which have a nonzero effect on the response. By considering the set of true and false discoveries at the group level, this method gains power relative to sparse regression methods. We also apply our method to the multitask regression problem where multiple response variables share similar sparsity patterns across the set of possible features. Empirically, the group knockoff filter successfully controls false discoveries at the group level in both settings, with substantially more discoveries made by leveraging the group structure.

19 citations


Posted Content
TL;DR: This paper proposes a new framework, called Trimmed Conformal Prediction (TCP), based on two stage procedure, a trimming step and a prediction step, which can be applied to any regression method, and further offers both statistical accuracy and computational gains.
Abstract: In regression, conformal prediction is a general methodology to construct prediction intervals in a distribution-free manner. Although conformal prediction guarantees strong statistical property for predictive inference, its inherent computational challenge has attracted the attention of researchers in the community. In this paper, we propose a new framework, called Trimmed Conformal Prediction (TCP), based on two stage procedure, a trimming step and a prediction step. The idea is to use a preliminary trimming step to substantially reduce the range of possible values for the prediction interval, and then applying conformal prediction becomes far more efficient. As is the case of conformal prediction, TCP can be applied to any regression method, and further offers both statistical accuracy and computational gains. For a specific example, we also show how TCP can be implemented in the sparse regression setting. The experiments on both synthetic and real data validate the empirical performance of TCP.

15 citations


Journal Article
TL;DR: The MOCCA (mirrored convex/concave) algorithm is proposed, a primal/dual optimization approach that takes a local convex approximation to each term at every iteration, and offers theoretical guarantees for convergence when the overall problem is approximately convex.
Abstract: Many optimization problems arising in high-dimensional statistics decompose naturally into a sum of several terms, where the individual terms are relatively simple but the composite objective function can only be optimized with iterative algorithms. In this paper, we are interested in optimization problems of the form F(Kx) + G(x), where K is a fixed linear transformation, while F and G are functions that may be nonconvex and/or nondifferentiable. In particular, if either of the terms are nonconvex, existing alternating minimization techniques may fail to converge; other types of existing approaches may instead be unable to handle nondifferentiability. We propose the mocca (mirrored convex/concave) algorithm, a primal/dual optimization approach that takes a local convex approximation to each term at every iteration. Inspired by optimization problems arising in computed tomography (CT) imaging, this algorithm can handle a range of nonconvex composite optimization problems, and offers theoretical guarantees for convergence when the overall problem is approximately convex (that is, any concavity in one term is balanced out by convexity in the other term). Empirical results show fast convergence for several structured signal recovery problems.

Book ChapterDOI
01 Jan 2016
TL;DR: This work considers Bayesian variable selection in sparse high-dimensional regression, where the number of covariates p may be large relative to the sample size n, but at most a moderate number q of covariate are active, and treats generalized linear models.
Abstract: We consider Bayesian variable selection in sparse high-dimensional regression, where the number of covariates p may be large relative to the sample size n, but at most a moderate number q of covariates are active. Specifically, we treat generalized linear models. For a single fixed sparse model with well-behaved prior distribution, classical theory proves that the Laplace approximation to the marginal likelihood of the model is accurate for sufficiently large sample size n. We extend this theory by giving results on uniform accuracy of the Laplace approximation across all models in a high-dimensional scenario in which p and q, and thus also the number of considered models, may increase with n. Moreover, we show how this connection between marginal likelihood and Laplace approximation can be used to obtain consistency results for Bayesian approaches to variable selection in high-dimensional regression.