Sparse inverse covariance estimation with the lasso

Home
/
Papers
/
Sparse inverse covariance estimation with the lasso

Posted Content•

Sparse inverse covariance estimation with the lasso

Jerome H. Friedman¹, Trevor Hastie¹, Robert Tibshirani¹•Institutions (1)

27 Aug 2007-arXiv: Methodology-

TL;DR: A simple algorithm, using a coordinate descent procedure for the lasso, is developed that solves a 1000 node problem in at most a minute, and is 30 to 4000 times faster than competing methods.

read less

Abstract: We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm| the Graphical Lasso| that is remarkably fast: it solves a 1000 node problem (» 500; 000 parameters) in at most a minute, and is 30 to 4000 times faster than competing methods. It also provides a conceptual link between the exact problem and the approximation suggested by Meinshausen & B˜ uhlmann (2006). We illustrate the method on some cell-signaling data from proteomics.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•

A Selective Overview of Variable Selection in High Dimensional Feature Space.

[...]

Jianqing Fan¹, Jinchi Lv²•Institutions (2)

Princeton University¹, University of Southern California²

01 Jan 2010-Statistica Sinica

TL;DR: In this paper, a brief account of the recent developments of theory, methods, and implementations for high-dimensional variable selection is presented, with emphasis on independence screening and two-scale methods.

...read moreread less

Abstract: High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods.

...read moreread less

892 citations

Journal Article•DOI•

Abnormal intrinsic brain functional network dynamics in Parkinson's disease.

[...]

Jinhee Kim¹, Marion Criaud¹, Marion Criaud², Sang Soo Cho², Sang Soo Cho¹, María Díez-Cirarda¹, María Díez-Cirarda², Alexander Mihaescu¹, Alexander Mihaescu², Sarah Coakeley², Sarah Coakeley¹, Christine Ghadery¹, Christine Ghadery², Mikaeel Valli², Mikaeel Valli¹, Mark F. Jacobs², Mark F. Jacobs¹, Sylvain Houle², Antonio P. Strafella², Antonio P. Strafella¹ - Show less +16 more•Institutions (2)

University Health Network¹, Centre for Addiction and Mental Health²

01 Nov 2017-Brain

TL;DR: The altered functional segregation and abnormal global integration in brain networks confirmed the vulnerability of functional connectivity networks in Parkinson’s disease.

...read moreread less

Abstract: Parkinson’s disease is a neurodegenerative disorder characterized by nigrostriatal dopamine depletion. Previous studies measuring spontaneous brain activity using resting state functional magnetic resonance imaging have reported abnormal changes in broadly distributed whole-brain networks. Although resting state functional connectivity, estimating temporal correlations between brain regions, is measured with the assumption that intrinsic fluctuations throughout the scan are stable, dynamic changes of functional connectivity have recently been suggested to reflect aspects of functional capacity of neural systems, and thus may serve as biomarkers of disease. The present work is the first study to investigate the dynamic functional connectivity in patients with Parkinson’s disease, with a focus on the temporal properties of functional connectivity states as well as the variability of network topological organization using resting state functional magnetic resonance imaging. Thirty-one Parkinson’s disease patients and 23 healthy controls were studied using group spatial independent component analysis, a sliding windows approach, and graph-theory methods. The dynamic functional connectivity analyses suggested two discrete connectivity configurations: a more frequent, sparsely connected within-network state (State I) and a less frequent, more strongly interconnected between-network state (State II). In patients with Parkinson’s disease, the occurrence of the sparsely connected State I dropped by 12.62%, while the expression of the more strongly interconnected State II increased by the same amount. This was consistent with the altered temporal properties of the dynamic functional connectivity characterized by a shortening of the dwell time of State I and by a proportional increase of the dwell time pattern in State II. These changes are suggestive of a reduction in functional segregation among networks and are correlated with the clinical severity of Parkinson’s disease symptoms. Additionally, there was a higher variability in the network global efficiency, suggesting an abnormal global integration of the brain networks. The altered functional segregation and abnormal global integration in brain networks confirmed the vulnerability of functional connectivity networks in Parkinson’s disease.

...read moreread less

234 citations

Journal Article•DOI•

Regularized estimation of large-scale gene association networks using graphical Gaussian models

[...]

Nicole C. Krämer¹, Juliane Schäfer², Juliane Schäfer³, Anne-Laure Boulesteix⁴•Institutions (4)

Analysis Group¹, ETH Zurich², University Hospital of Basel³, Ludwig Maximilian University of Munich⁴

24 Nov 2009-BMC Bioinformatics

TL;DR: A general framework for combining regularized regression methods with the estimation of Graphical Gaussian models is investigated, which includes various existing methods as well as two new approaches based on ridge regression and adaptive lasso, respectively.

...read moreread less

Abstract: Graphical Gaussian models are popular tools for the estimation of (undirected) gene association networks from microarray data. A key issue when the number of variables greatly exceeds the number of samples is the estimation of the matrix of partial correlations. Since the (Moore-Penrose) inverse of the sample covariance matrix leads to poor estimates in this scenario, standard methods are inappropriate and adequate regularization techniques are needed. Popular approaches include biased estimates of the covariance matrix and high-dimensional regression schemes, such as the Lasso and Partial Least Squares. In this article, we investigate a general framework for combining regularized regression methods with the estimation of Graphical Gaussian models. This framework includes various existing methods as well as two new approaches based on ridge regression and adaptive lasso, respectively. These methods are extensively compared both qualitatively and quantitatively within a simulation study and through an application to six diverse real data sets. In addition, all proposed algorithms are implemented in the R package "parcor", available from the R repository CRAN. In our simulation studies, the investigated non-sparse regression methods, i.e. Ridge Regression and Partial Least Squares, exhibit rather conservative behavior when combined with (local) false discovery rate multiple testing in order to decide whether or not an edge is present in the network. For networks with higher densities, the difference in performance of the methods decreases. For sparse networks, we confirm the Lasso's well known tendency towards selecting too many edges, whereas the two-stage adaptive Lasso is an interesting alternative that provides sparser solutions. In our simulations, both sparse and non-sparse methods are able to reconstruct networks with cluster structures. On six real data sets, we also clearly distinguish the results obtained using the non-sparse methods and those obtained using the sparse methods where specification of the regularization parameter automatically means model selection. In five out of six data sets, Partial Least Squares selects very dense networks. Furthermore, for data that violate the assumption of uncorrelated observations (due to replications), the Lasso and the adaptive Lasso yield very complex structures, indicating that they might not be suited under these conditions. The shrinkage approach is more stable than the regression based approaches when using subsampling.

...read moreread less

224 citations

Journal Article•DOI•

Transposable regularized covariance models with an application to missing data imputation

[...]

Genevera I. Allen¹, Robert Tibshirani¹•Institutions (1)

Stanford University¹

01 Jun 2010-The Annals of Applied Statistics

TL;DR: In this paper, a transposable regularized covariance model is proposed to estimate the mean and non-singular covariance matrices of high-dimensional data in the form of a matrix, where rows and columns each have a separate mean vector and covariance matrix.

...read moreread less

Abstract: Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable, meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal, in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.

...read moreread less

133 citations

Journal Article•DOI•

Regularized estimation of large-scale gene association networks using graphical Gaussian models

[...]

Nicole Kraemer, Juliane Schaefer, Anne-Laure Boulesteix

05 May 2009-arXiv: Methodology

TL;DR: In this article, the authors investigate a general framework for combining regularized regression methods with the estimation of Graphical Gaussian models, including various existing methods as well as two new approaches based on ridge regression and adaptive lasso, respectively.

...read moreread less

129 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

Collapse

References

PDF

Open Access

More filters

Book•

Convex Optimization

[...]

Stephen Boyd¹, Lieven Vandenberghe²•Institutions (2)

Stanford University¹, University of California, Los Angeles²

01 Mar 2004

TL;DR: In this article, the focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them, and a comprehensive introduction to the subject is given. But the focus of this book is not on the optimization problem itself, but on the problem of finding the appropriate technique to solve it.

...read moreread less

Abstract: Convex optimization problems arise frequently in many different fields. A comprehensive introduction to the subject, this book shows in detail how such problems can be solved numerically with great efficiency. The focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them. The text contains many worked examples and homework exercises and will appeal to students, researchers and practitioners in fields such as engineering, computer science, mathematics, statistics, finance, and economics.

...read moreread less

33,341 citations

Journal Article•DOI•

High-dimensional graphs and variable selection with the Lasso

[...]

Nicolai Meinshausen, Peter Bühlmann

01 Jun 2006-Annals of Statistics

TL;DR: It is shown that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse high-dimensional graphs and is hence equivalent to variable selection for Gaussian linear models.

...read moreread less

Abstract: The pattern of zero entries in the inverse covariance matrix of a multivariate normal distribution corresponds to conditional independence restrictions between variables. Covariance selection aims at estimating those structural zeros from data. We show that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse high-dimensional graphs. Neighborhood selection estimates the conditional independence restrictions separately for each node in the graph and is hence equivalent to variable selection for Gaussian linear models. We show that the proposed neighborhood selection scheme is consistent for sparse high-dimensional graphs. Consistency hinges on the choice of the penalty parameter. The oracle value for optimal prediction does not lead to a consistent neighborhood estimate. Controlling instead the probability of falsely joining some distinct connectivity components of the graph, consistent estimation for sparse graphs is achieved (with exponential rates), even when the number of variables grows as the number of observations raised to an arbitrary power.

...read moreread less

3,793 citations

"Sparse inverse covariance estimatio..." refers background or methods in this paper

...Both papers also establish that the simpler approach of Meinshausen & Bühlmann (2006) can be viewed as an approximation to the exact problem....
[...]
...Meinshausen & Bühlmann (2006) take a simple approach to this problem: they estimate a sparse graphical model by fitting a lasso model to each variable, using the others as predictors....
[...]
...As pointed out by Banerjee et al. (2007), W11 6= S11 in general and hence the Meinshausen & Bühlmann (2006) approach does not yield the maximum likelihood estimator....
[...]
...In fact if W11 = S11, then the solutions β̂ are easily seen to equal one-half of the lasso estimates for the pth variable on the others, and hence related to the Meinshausen & Bühlmann (2006) proposal....
[...]
...It also bridges the “conceptual gap” between the Meinshausen & Bühlmann (2006) proposal and the exact problem....
[...]

Journal Article•DOI•

Model selection and estimation in the Gaussian graphical model

[...]

Ming Yuan¹, Yi Lin²•Institutions (2)

Georgia Institute of Technology¹, University of Wisconsin-Madison²

01 Mar 2007-Biometrika

TL;DR: The implementation of the penalized likelihood methods for estimating the concentration matrix in the Gaussian graphical model is nontrivial, but it is shown that the computation can be done effectively by taking advantage of the efficient maxdet algorithm developed in convex optimization.

...read moreread less

Abstract: SUMMARY We propose penalized likelihood methods for estimating the concentration matrix in the Gaussian graphical model. The methods lead to a sparse and shrinkage estimator of the concentration matrix that is positive definite, and thus conduct model selection and estimation simultaneously. The implementation of the methods is nontrivial because of the positive definite constraint on the concentration matrix, but we show that the computation can be done effectively by taking advantage of the efficient maxdet algorithm developed in convex optimization. We propose a BIC-type criterion for the selection of the tuning parameter in the penalized likelihood methods. The connection between our methods and existing methods is illustrated. Simulations and real examples demonstrate the competitive performance of the new methods.

...read moreread less

1,824 citations

"Sparse inverse covariance estimatio..." refers methods in this paper

...The sparse scenario is the AR(1) model taken from Yuan & Lin (2007): βii = 1, βi,i−1 = βi−1,i = 0.5, and zero otherwise....
[...]
...Expression (1) is the Gaussian log-likelihood of the data, partially maximized with respect to the mean parameter µ. Yuan & Lin (2007) solve this problem using the interior point method for the “maxdet” problem, proposed by Vandenberghe et al. (1998)....
[...]
...Other authors have proposed algorithms for the exact maximization of the L1-penalized log-likelihood; both Yuan & Lin (2007) and Banerjee et al. (2007) adapt interior point optimization methods for the solution to this problem....
[...]

Journal Article•DOI•

Pathwise coordinate optimization

[...]

Jerome H. Friedman, Trevor Hastie, Holger Höfling, Robert Tibshirani

10 Aug 2007-arXiv: Computation

TL;DR: It is shown that coordinate descent is very competitive with the well-known LARS procedure in large lasso problems, can deliver a path of solutions efficiently, and can be applied to many other convex statistical problems such as the garotte and elastic net.

...read moreread less

Abstract: We consider ``one-at-a-time'' coordinate-wise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the $L_1$-penalized regression (lasso) in the literature, but it seems to have been largely ignored. Indeed, it seems that coordinate-wise algorithms are not often used in convex optimization. We show that this algorithm is very competitive with the well-known LARS (or homotopy) procedure in large lasso problems, and that it can be applied to related methods such as the garotte and elastic net. It turns out that coordinate-wise descent does not work in the ``fused lasso,'' however, so we derive a generalized algorithm that yields the solution in much less time that a standard convex optimizer. Finally, we generalize the procedure to the two-dimensional fused lasso, and demonstrate its performance on some image smoothing problems.

...read moreread less

1,785 citations

"Sparse inverse covariance estimatio..." refers background in this paper

...The lasso problem in step (2) above can be efficiently solved by coordinate descent (Friedman et al. (2007),Wu & Lange (2007))....
[...]
...It also bridges the “conceptual gap” between the Meinshausen & Bühlmann (2006) proposal and the exact problem....
[...]
...We do, to great advantage, because fast coordinate descent algorithms (Friedman et al. 2007) make solution of the lasso problem very attractive....
[...]

Journal Article•DOI•

Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data

[...]

Karen Sachs¹, Karen Sachs², Karen Sachs³, Omar D. Perez¹, Omar D. Perez², Omar D. Perez³, Dana Pe'er³, Dana Pe'er¹, Dana Pe'er², Douglas A. Lauffenburger¹, Douglas A. Lauffenburger³, Douglas A. Lauffenburger², Garry P. Nolan², Garry P. Nolan¹, Garry P. Nolan³ - Show less +11 more•Institutions (3)

Massachusetts Institute of Technology¹, Stanford University², Harvard University³

22 Apr 2005-Science

TL;DR: Reconstruction of network models from physiologically relevant primary single cells might be applied to understanding native-state tissue signaling biology, complex drug actions, and dysfunctional signaling in diseased cells.

...read moreread less

Abstract: Machine learning was applied for the automated derivation of causal influences in cellular signaling networks. This derivation relied on the simultaneous measurement of multiple phosphorylated protein and phospholipid components in thousands of individual primary human immune system cells. Perturbing these cells with molecular interventions drove the ordering of connections between pathway components, wherein Bayesian network computational methods automatically elucidated most of the traditionally reported signaling relationships and predicted novel interpathway network causalities, which we verified experimentally. Reconstruction of network models from physiologically relevant primary single cells might be applied to understanding native-state tissue signaling biology, complex drug actions, and dysfunctional signaling in diseased cells.

...read moreread less

1,736 citations