scispace - formally typeset
Search or ask a question

Showing papers by "Robert Tibshirani published in 2008"


Journal ArticleDOI
TL;DR: Using a coordinate descent procedure for the lasso, a simple algorithm is developed that solves a 1000-node problem in at most a minute and is 30-4000 times faster than competing methods.
Abstract: We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm--the graphical lasso--that is remarkably fast: It solves a 1000-node problem ( approximately 500,000 parameters) in at most a minute and is 30-4000 times faster than competing methods. It also provides a conceptual link between the exact problem and the approximation suggested by Meinshausen and Buhlmann (2006). We illustrate the method on some cell-signaling data from proteomics.

5,577 citations


Journal ArticleDOI
TL;DR: The fused lasso criterion leads to a convex optimization problem, and a fast algorithm is provided for its solution, which generally outperforms competing methods for calling gains and losses in CGH data.
Abstract: SUMMARY We apply the “fused lasso” regression method of Tibshirani and others (2004) to the problem of “hotspot detection”, in particular, detection of regions of gain or loss in comparative genomic hybridization (CGH) data. The fused lasso criterion leads to a convex optimization problem, and we provide a fast algorithm for its solution. Estimates of false-discovery rate are also provided. Our studies show that the new method generally outperforms competing methods for calling gains and losses in CGH data.

377 citations


Journal ArticleDOI
01 Dec 2008-Blood
TL;DR: Although an important component of the biology of a malignant cell is inherited from its nontransformed cellular progenitor-GC centroblasts-aberrant miRNA expression is acquired upon cell transformation, expression of some of the miRNAs in this signature is correlated with clinical outcome of uniformly treated DLBCL patients.

254 citations


Journal ArticleDOI
TL;DR: It is concluded that LMO2 protein expression is a prognostic marker in DLBCL patients treated with anthracycline-based regimens alone or in combination with rituximab.
Abstract: Purpose The heterogeneity of diffuse large B-cell lymphoma (DLBCL) has prompted the search for new markers that can accurately separate prognostic risk groups. We previously showed in a multivariate model that LMO2 mRNA was a strong predictor of superior outcome in DLBCL patients. Here, we tested the prognostic impact of LMO2 protein expression in DLBCL patients treated with anthracycline-based chemotherapy with or without rituximab. Patients and Methods DLBCL patients treated with anthracycline-based chemotherapy alone (263 patients) or with the addition of rituximab (80 patients) were studied using immunohistochemistry for LMO2 on tissue microarrays of original biopsies. Staining results were correlated with outcome. Results In anthracycline-treated patients, LMO2 protein expression was significantly correlated with improved overall survival (OS) and progression-free survival (PFS) in univariate analyses (OS, P = .018; PFS, P = .010) and was a significant predictor independent of the clinical Internatio...

163 citations


Journal ArticleDOI
TL;DR: This work proposes a method for variable selection that first estimates the regression function, yielding a "preconditioned" response variable, and shows that under a certain Gaussian latent variable model, application of the LASSO to the preconditioned response variable is consistent as the number of predictors and observations increases.
Abstract: We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for variable selection that first estimates the regression function, yielding a "preconditioned" response variable. The primary method used for this initial regression is supervised principal components. Then we apply a standard procedure such as forward stepwise selection or the LASSO to the preconditioned response variable. In a number of simulated and real data examples, this two-step procedure outperforms forward stepwise selection or the usual LASSO (applied directly to the raw outcome). We also show that under a certain Gaussian latent variable model, application of the LASSO to the preconditioned response variable is consistent as the number of predictors and observations increases. Moreover, when the observational noise is rather large, the suggested procedure can give a more accurate estimate than LASSO. We illustrate our method on some real problems, including survival analysis with microarray data.

117 citations


Journal ArticleDOI
TL;DR: A meta-analysis of data from thousands of microarrays for humans, mice, and fruit flies finds millions of implication relationships between genes that would be missed by other methods.
Abstract: We describe a method for extracting Boolean implications (if-then relationships) in very large amounts of gene expression microarray data. A meta-analysis of data from thousands of microarrays for humans, mice, and fruit flies finds millions of implication relationships between genes that would be missed by other methods. These relationships capture gender differences, tissue differences, development, and differentiation. New relationships are discovered that are preserved across all three species.

117 citations


Journal ArticleDOI
TL;DR: Multiplex PLA using either matched monoclonal antibodies or single batches of polyclonal antibody should prove useful for identifying and validating sets of putative disease biomarkers and finding multimarker panels.
Abstract: Background: Sensitive methods are needed for biomarker discovery and validation. We tested one promising technology, multiplex proximity ligation assay (PLA), in a pilot study profiling plasma biomarkers in pancreatic and ovarian cancer. Methods: We used 4 panels of 6- and 7-plex PLAs to detect biomarkers, with each assay consuming 1 μL plasma and using either matched monoclonal antibody pairs or single batches of polyclonal antibody. Protein analytes were converted to unique DNA amplicons by proximity ligation and subsequently detected by quantitative PCR. We profiled 18 pancreatic cancer cases and 19 controls and 19 ovarian cancer cases and 20 controls for the following proteins: a disintegrin and metalloprotease 8, CA-125, CA 19-9, carboxypeptidase A1, carcinoembryonic antigen, connective tissue growth factor, epidermal growth factor receptor, epithelial cell adhesion molecule, Her2, galectin-1, insulin-like growth factor 2, interleukin-1α, interleukin-7, mesothelin, macrophage migration inhibitory factor, osteopontin, secretory leukocyte peptidase inhibitor, tumor necrosis factor α, vascular endothelial growth factor, and chitinase 3–like 1. Probes for CA-125 were present in 3 of the multiplex panels. We measured plasma concentrations of the CA-125–mesothelin complex by use of a triple-specific PLA with 2 ligation events among 3 probes. Results: The assays displayed consistent measurements of CA-125 independent of which other markers were simultaneously detected and showed good correlation with Luminex data. In comparison to literature reports, we achieved expected results for other putative markers. Conclusion: Multiplex PLA using either matched monoclonal antibodies or single batches of polyclonal antibody should prove useful for identifying and validating sets of putative disease biomarkers and finding multimarker panels.

110 citations


Journal ArticleDOI
TL;DR: It is demonstrated that 2 IFN-I signaling molecules, IFN regulatory factor 9 (IRF9) and STAT1, were required for the production of IgG autoantibodies in the pristane-induced mouse model of SLE, and suggests that IFn-I is upstream of TLR signaling in the activation of autoreactive B cells in SLE.
Abstract: A hallmark of SLE is the production of high-titer, high-affinity, isotype-switched IgG autoantibodies directed against nucleic acid-associated antigens. Several studies have established a role for both type I IFN (IFN-I) and the activation of TLRs by nucleic acid-associated autoantigens in the pathogenesis of this disease. Here, we demonstrate that 2 IFN-I signaling molecules, IFN regulatory factor 9 (IRF9) and STAT1, were required for the production of IgG autoantibodies in the pristane-induced mouse model of SLE. In addition, levels of IgM autoantibodies were increased in pristane-treated Irf9 -/- mice, suggesting that IRF9 plays a role in isotype switching in response to self antigens. Upregulation of TLR7 by IFN-alpha was greatly reduced in Irf9 -/- and Stat1 -/- B cells. Irf9 -/- B cells were incapable of being activated through TLR7, and Stat1 -/- B cells were impaired in activation through both TLR7 and TLR9. These data may reveal a novel role for IFN-I signaling molecules in both TLR-specific B cell responses and production of IgG autoantibodies directed against nucleic acid-associated autoantigens. Our results suggest that IFN-I is upstream of TLR signaling in the activation of autoreactive B cells in SLE.

95 citations


Journal ArticleDOI
01 May 2008-Blood
TL;DR: In this paper, gene-expression patterns correlated with FLT3-ITD mutation were found to be correlated with the utility of a FLT-3 signature for prognostication.

94 citations


Journal ArticleDOI
TL;DR: In this article, the authors have shown that increased tumor vascularity is associated with poor overall survival (P=0.047) and is independent of the international prognostic index, while high expression of vascular endothelial growth factor receptor-1 by lymphoma cells was associated with improved overall survival.

83 citations


Journal ArticleDOI
TL;DR: There are tantalizing similarities between the Dantzig Selector (DS) and the LARS methods, but they are not the same and produce somewhat different models.
Abstract: Discussion of ``The Dantzig selector: Statistical estimation when $p$ is much larger than $n$'' [math/0506081]

ComponentDOI
TL;DR: A theoretical framework under which LPC is the logical choice for identifying significant genes is presented, and it is shown that LPC can provide a marked reduction in false discovery rates over the conventional methods on both real and simulated data.
Abstract: We consider the problem of testing the significance of features in high-dimensional settings. In particular, we test for differentially-expressed genes in a microarray experiment. We wish to identify genes that are associated with some type of outcome, such as survival time or cancer type. We propose a new procedure, called Lassoed Principal Components (LPC), that builds upon existing methods and can provide a sizable improvement. For instance, in the case of two-class data, a standard (albeit simple) approach might be to compute a two-sample t-statistic for each gene. The LPC method involves projecting these conventional gene scores onto the eigenvectors of the gene expression data covariance matrix and then applying an L1 penalty in order to de-noise the resulting projections. We present a theoretical framework under which LPC is the logical choice for identifying significant genes, and we show that LPC can provide a marked reduction in false discovery rates over the conventional methods on both real and simulated data. Moreover, this flexible procedure can be applied to a variety of types of data and can be used to improve many existing methods for the identification of significant features.

Journal ArticleDOI
TL;DR: The findings identify hCAP-D3 as a new biomarker for subtype-1 tumors that improves prognostication, and reveal androgen signaling as an important biologic feature of this potentially clinically favorable molecular subtype.
Abstract: Growing evidence suggests that only a fraction of prostate cancers detected clinically are potentially lethal. An important clinical issue is identifying men with indolent cancer who might be spared aggressive therapies with associated morbidities. Previously, using microarray analysis we defined 3

Journal ArticleDOI
TL;DR: A database of genes that are modulated in WBCs and splenocytes at sequential time points after burn or T-H in mice is provided and reveals that relatively few leukocyte genes are expressed in common after these two forms of injury.
Abstract: A primary objective of the large collaborative project entitled “Inflammation and the Host Response to Injury” was to identify leukocyte genes that are differentially expressed after two different ...

Journal ArticleDOI
TL;DR: This work proposes a procedure called complementary hierarchical clustering that is designed to uncover the structures arising from these novel genes that are not as highly expressed in breast cancer patients.
Abstract: When applying hierarchical clustering algorithms to cluster patient samples from microarray data, the clustering patterns generated by most algorithms tend to be dominated by groups of highly differentially expressed genes that have closely related expression patterns. Sometimes, these genes may not be relevant to the biological process under study or their functions may already be known. The problem is that these genes can potentially drown out the effects of other genes that are relevant or have novel functions. We propose a procedure called complementary hierarchical clustering that is designed to uncover the structures arising from these novel genes that are not as highly expressed. Simulation studies show that the procedure is effective when applied to a variety of examples. We also define a concept called relative gene importance that can be used to identify the influential genes in a given clustering. Finally, we analyze a microarray data set from 295 breast cancer patients, using clustering with the correlation-based distance measure. The complementary clustering reveals a grouping of the patients which is uncorrelated with a number of known prognostic signatures and significantly differing distant metastasis-free probabilities.

Journal ArticleDOI
TL;DR: In this article, a permutation test was proposed to determine if the inferences drawn from pre-validated predictions are valid, which was shown to have the same power as the one-degree-of-freedom analytical test.
Abstract: Given a predictor of outcome derived from a high-dimensional dataset, pre-validation is a useful technique for comparing it to competing predictors on the same dataset. For microarray data, it allows one to compare a newly derived predictor for disease outcome to standard clinical predictors on the same dataset. We study pre-validation analytically to determine if the inferences drawn from it are valid. We show that while pre-validation generally works well, the straightforward “one degree of freedom” analytical test from pre-validation can be biased and we propose a permutation test to remedy this problem. In simulation studies, we show that the permutation test has the nominal level and achieves roughly the same power as the analytical test.

Journal Article
TL;DR: Treelets as discussed by the authors extends wavelet wavelet to nonsmooth signals and returns a hierarchical tree and an orthonormal basis, which both reflect the internal structure of the data and are especially well suited as a dimensionality reduction and feature selection tool prior to regression and classification.
Abstract: In many modern applications, including analysis of gene expression and text documents, the data are noisy, high-dimensional, and unordered-with no particular meaning to the given order of the variables. Yet, successful learning is often possible due to sparsity: the fact that the data are typically redundant with underlying structures that can be represented by only a few features. In this paper we present treelets-a novel construction of multi-scale bases that extends wavelets to nonsmooth signals. The method is fully adaptive, as it returns a hierarchical tree and an orthonormal basis which both reflect the internal structure of the data. Treelets are especially well-suited as a dimensionality reduction and feature selection tool prior to regression and classification, in situations where sample sizes are small and the data are sparse with unknown groupings of correlated or collinear variables. The method is also simple to implement and analyse theoretically. Here we describe a variety of situations where treelets perform better than principal component analysis, as well as some common variable selection and cluster averaging schemes. We illustrate treelets on a blocked covariance model and on several data sets (hyperspectral image data, DNA microarray data, and internet advertisements) with highly complex dependencies between variables.

Journal ArticleDOI
01 Oct 2008-JAMA
TL;DR: The case rejected the view that decency requires treating the father’s false beliefs in the medical benefits of a Chinese remedy for his dead child as true and allowed that, under integrity-preserving and nonburdensome circumstances, it is decent for physicians to accommodate some false medical beliefs of living patients.
Abstract: tion. In the actual case, the family did consider but in the end declined to donate—a point that was not raised in the Grand Rounds. Dr McCollough does not disagree with the article’s conclusion or grounds: that the medical treatment of a dead body is a farce that violates professional integrity, hinders the acceptance of brain death, and is an unreasonable expenditure of public resources and professional effort. Rather, he objects that our discussion is not needed because the ethics of this case are straightforward. In contrast, the Grand Rounds discussant (Dr Applbaum) wondered why so many thoughtful, smart, and conscientious physicians and ethicists—not one of whom questioned that brain death is death—thought it a hard case and why others thought it a straightforward case in the other direction. Hence, we saw the need for an analysis McCullough finds unnecessary. “Patient” in the article usually is a definite description referring to the young woman, not to her normative status, role, or relation, and “life support” is a noun, not a successverb. So disconnecting a dead patient from life support is no more contradictory than unzipping a dead sailor from a life vest. The philosophical issue is not one of definitions. The case presents an open normative question: are physicians ever permitted or required to treat those declared dead on neurological criteria? Calling the entity at issue a cadaver (we ourselves call it on occasion a corpse) does not settle the matter. Finally, we rejected the view that decency requires treating the father’s false beliefs in the medical benefits of a Chinese remedy for his dead child as true. Indeed, our central claim is that individuals are not entitled to have others act as if they are alive when by public criteria they are dead. Rather, we allow that, under integrity-preserving and nonburdensome circumstances, it is decent for physicians to accommodate some false medical beliefs of living patients, and physicians are permitted to accommodate the religious beliefs of dead patients when doing so would not be construed as endorsing false medical beliefs or denying public criteria of death. It would indeed be disrespectful to insincerely express endorsement of another’s false beliefs. But the article asks about action, not expression. Respect for autonomy often requires that we not thwart, and sometimes even requires that we assist, the mistaken plans of others.

Journal ArticleDOI
TL;DR: Tumor volume was a powerful predictor of recurrence in men after radical prostatectomy with moderate and high risk features, even after accounting for the effects of percentage Gleason pattern 4/5 cancer, extracapsular extension, seminal vesicle invasion and lymph node metastasis.
Abstract: Purpose: For screening to make an impact on prostate cancer mortality, detection of potentially lethal cancers at an early stage when they are low volume should result in improved recurrence and death rates after treatment. Patients and Methods: The effect of tumor volume on prostate cancer recurrence and death was evaluated in 764 men who underwent radical prostatectomy between 1984 and 2004, with particular attention focused on patients with moderate and high risk features. Results: Tumor volume was a powerful predictor of recurrence in men after radical prostatectomy with moderate and high risk features, even after accounting for the effects of percentage Gleason pattern 4/5 cancer, extracapsular extension, seminal vesicle invasion, lymph node metastasis, pre-operative PSA, and surgical margin involvement. In a subset of 159 patients for whom pre-operative PSA velocity was available, tumor volume predicted recurrence in those in the highest risk category (PSAV > 2 ng/ml/yr). Tumor volume, along with percent grade 4/5 and positive surgical margins, was sig- nificantly associated with prostate cancer specific death. Conclusions: The association of volume with outcome after radical prostatectomy, particularly in high risk patients, sug- gests that screening has made a positive impact on prostate cancer mortality. Future screening efforts should be directed at finding cancers with moderate and high risk features at low volume.

Journal ArticleDOI
TL;DR: In this issue of Journal of Clinical Oncology, Bhojwani et al report a study of gene expression profiling to predict early response and long-term outcome in children with high risk, pre–B-cell acute lymphoblastic leukemia (ALL).
Abstract: In this issue of Journal of Clinical Oncology, Bhojwani et al report a study of gene expression profiling to predict early response and long-term outcome in children with high risk, pre–B-cell acute lymphoblastic leukemia (ALL). The study used subsets from a group of 99 patients treated on Children’s Oncology Group Study 1961. The 82 patients who were considered either rapid or slow early responses (RER and SER) were randomly assigned into training sets (28 and 26 patients) and test sets (14 in each group). Only 59 patients were available for the long-term outcome analysis, 28 patients with continuous remission for at least 4 years and 31 patients with relapse within 3 years. A 24-gene signature was found to predict for early response, and a different 41-gene signature predicted long-term outcome. However, the threeand five-gene models that the authors derived were ultimately not more informative than existing clinical prognostic factors (age, WBC count, and karyotype). The predictive value of these models was better than chance but fell short of a truly useful step forward. The authors suggest that the failure of their expression-based predictors to have independent significance might be related to the possibility that the most important factors have been already identified. However, strong predictors such as certain chromosomal translocations may obscure more subtle prognostic factors, such as expression signatures. One potentially useful approach to this problem would be to stratify or study such subgroups separately for genomic analyses, an admittedly tall order in relatively uncommon diseases such as childhood ALL. This report is one of several recently published genomic studies in childhood ALL. Because of the relative scarcity of pediatric cancers, all of these studies have the limitation of small numbers of patients and low statistical power. Nonetheless, as with many other types of cancer, genomic studies of childhood ALL have provided new biologic insights into the pathogenesis and classification of the disease and insights into determinants of success or failure of therapies. The major translational goals of such studies are to stratify patients by more precise prognostic risk groups and to eventually select therapies based on more accurate prediction of drug responsiveness. Variations in treatment protocols among various studies and the biologic heterogeneity of childhood leukemias further complicate the analyses of genomic data in this disease. Thus Flotho et al recently reported that a 14-gene signature, which included several genes associated with cell proliferation, was predictive of outcome in 286 patients with childhood ALL treated with the Total XIII protocol. The same signature was confirmed to segregate a separate cohort of 127 Australian patients with ALL into two groups by Catchpoole et al, but did not have prognostic value in those patients who were treated with other regimens studied in BFM95 and ANZCHOG VIII. A fundamental problem in current genomic translational research is the traditional paradigm of sifting data to identify one or a few markers to use prospectively for prognosis of outcomes or prediction of therapies. If we are to fully exploit the potential of state-of-the-art genetic and genomic technologies for translational research and, ultimately, to individualize cancer therapies, this paradigm needs to fundamentally change. Rather than striving for simplification, the goal of genomic analyses should be to develop analytic approaches that use multidimensional data sets and, as West et al have suggested, “embrace the complexity of genomic data for personalized medicine.” We offer the following recommendations to meet this challenge. First, sufficient numbers of specimens should be gathered from well-annotated, uniformly treated patients to allow statistical power in the analyses. For relatively uncommon cancers, including childhood leukemias, this is a daunting task, and will require cooperation among various consortia and major centers. Pooling of data from multiple studies, which requires statistical cross-validation of the genomic platforms, is a feasible approach to increasing statistical power. Second, as much as possible, cancer genomic studies should stratify patients into homogeneous groups based on known clinical and pathologic prognostic factors, as well as treatment protocols, so that more subtle, underlying factors can be discerned. The number of patients available for study will again be a major issue for adequate statistical power of this approach. Third, translational research should harness the power of high throughput technologies for the acquisition of large-scale databases on each specimen, which would include genome-wide studies of gene copy number, mutations, and polymorphisms, as well as gene expression. The availability of substantial amounts of fresh tumor cells from hematologic malignancies, such as childhood leukemias, makes such studies more feasible than for solid tumors such as lung cancers, where scant diagnostic tissue is available from fine-needle aspirations or bronchoscopies. Fourth, various bioinformatic approaches should be developed and used to integrate these data sets and to develop models that reflect the complex systems biology of individual cancers. These approaches include uncovering the activity of networks of signaling and other JOURNAL OF CLINICAL ONCOLOGY E D I T O R I A L VOLUME 26 NUMBER 27 SEPTEMBER 2

Proceedings ArticleDOI
24 Aug 2008
TL;DR: This talk presents some effective algorithms based on coordinate descent for fitting large scale regularization paths for a variety of problems.
Abstract: In a statistical world faced with an explosion of data, regularization has become an important ingredient. In a wide variety of problems we have many more input features than observations, and the lasso penalty and its hybrids have become increasingly useful for both feature selection and regularization. This talk presents some effective algorithms based on coordinate descent for fitting large scale regularization paths for a variety of problems.

01 Jan 2008
TL;DR: These myriad interpretations of AdaBoost form a robust theory of the algorithm that provides understanding from an extraordinary range of points of view in which each perspective tells us something unique about the algorithm.
Abstract: For such a simple algorithm, it is fascinating and remarkable what a rich diversity of interpretations, views, perspectives and explanations have emerged of AdaBoost. Originally, AdaBoost was proposed as a “boosting” algorithm in the technical sense of the word: given access to “weak” classifiers, just slightly better in performance than random guessing, and given sufficient data, a true boosting algorithm can provably produce a combined classifier with nearly perfect accuracy (Freund and Schapire, 1997). AdaBoost has this property, but it also has been shown to be deeply connected with a surprising range of other topics, such as game theory, on-line learning, linear programming, logistic regression and maximum entropy (Breiman, 1999; Collins et al., 2002; Demiriz et al., 2002; Freund and Schapire, 1996, 1997; Kivinen and Warmuth, 1999; Lebanon and Lafferty, 2002). As we discuss further below, AdaBoost can been seen as a method for maximizing the “margins” or confidences of the predictions made by its generated classifier (Schapire et al., 1998). The current paper by Mease and Wyner, of course, focuses on another perspective, the so-called statistical view of boosting. This interpretation, particularly as expounded by Friedman et al. (2000), focuses on the algorithm as a stagewise procedure for minimizing the exponential loss function, which is related to the loss minimized in logistic regression, and whose minimization can be viewed, in a certain sense, as providing estimates of the conditional probability of the label. Taken together, these myriad interpretations of AdaBoost form a robust theory of the algorithm that provides understanding from an extraordinary range of points of view in which each perspective tells us something unique about the algorithm. The statistical view, for instance, has been of tremendous value, allowing for the practical conversion of AdaBoost’s predictions into conditional probabilities, as well as the algorithm’s generalization and extension to many other loss functions and learning problems. Still, each perspective has its weaknesses, which are important to identify to keep our theory in touch with reality. The current paper is superb in exposing empirical phenomena that are apparently difficult to understand according to the statistical view. From a theoretical perspective, the statistical interpretation has other weaknesses. As discussed by Mease and Wyner, this interpretation does not explain AdaBoost’s observed tendency not to overfit, particularly in the absence of regularization

Journal ArticleDOI
TL;DR: Lassoed Principal Components (LPC) as discussed by the authors was proposed to identify genes that are associated with some type of outcome, such as survival time or cancer type, in high-dimensional settings.
Abstract: We consider the problem of testing the significance of features in high-dimensional settings. In particular, we test for differentially-expressed genes in a microarray experiment. We wish to identify genes that are associated with some type of outcome, such as survival time or cancer type. We propose a new procedure, called Lassoed Principal Components (LPC), that builds upon existing methods and can provide a sizable improvement. For instance, in the case of two-class data, a standard (albeit simple) approach might be to compute a two-sample $t$-statistic for each gene. The LPC method involves projecting these conventional gene scores onto the eigenvectors of the gene expression data covariance matrix and then applying an $L_1$ penalty in order to de-noise the resulting projections. We present a theoretical framework under which LPC is the logical choice for identifying significant genes, and we show that LPC can provide a marked reduction in false discovery rates over the conventional methods on both real and simulated data. Moreover, this flexible procedure can be applied to a variety of types of data and can be used to improve many existing methods for the identification of significant features.

Journal ArticleDOI
TL;DR: Discussion of "Treelets--An adaptive multi-scale basis for sparse unordered data" [arXiv:0707.0481]
Abstract: Discussion of "Treelets--An adaptive multi-scale basis for sparse unordered data" [arXiv:0707.0481]

Journal ArticleDOI
TL;DR: In this paper, an adaptive multi-scale basis for sparse unordered data is proposed, based on treelets, which can be used to represent sparse data in a multiscale manner.
Abstract: Discussion of "Treelets--An adaptive multi-scale basis for sparse unordered data" [arXiv:0707.0481]