Showing papers by "Robert Tibshirani published in 2017"

PDF

Open Access

Journal Article•DOI•

[...]

Nima Aghaeepour¹, Edward A. Ganio¹, David R. McIlwain¹, Amy S. Tsai¹, Martha Tingle¹, Sofie Van Gassen², Dyani Gaudilliere¹, Quentin Baca¹, Leslie McNeil¹, Robin Okada¹, Mohammad Sajjad Ghaemi¹, David Furman³, David Furman¹, Ronald J. Wong¹, Virginia D. Winn¹, Maurice L. Druzin¹, Yaser Y. El-Sayed¹, Cecele C. Quaintance¹, Ronald S. Gibbs¹, Gary L. Darmstadt¹, Gary M. Shaw¹, David K. Stevenson¹, Robert Tibshirani¹, Garry P. Nolan¹, David B. Lewis¹, Martin S. Angst¹, Brice Gaudilliere¹ - Show less +23 more•Institutions (3)

Stanford University¹, Ghent University², National Scientific and Technical Research Council³

01 Sep 2017-Science immunology

TL;DR: These findings unravel the precise timing of immunological events occurring during a term pregnancy and provide the analytical framework to identify immunological deviations implicated in pregnancy-related pathologies.

...read moreread less

Abstract: The maintenance of pregnancy relies on finely tuned immune adaptations. We demonstrate that these adaptations are precisely timed, reflecting an immune clock of pregnancy in women delivering at term. Using mass cytometry, the abundance and functional responses of all major immune cell subsets were quantified in serial blood samples collected throughout pregnancy. Cell signaling-based Elastic Net, a regularized regression method adapted from the elastic net algorithm, was developed to infer and prospectively validate a predictive model of interrelated immune events that accurately captures the chronology of pregnancy. Model components highlighted existing knowledge and revealed previously unreported biology, including a critical role for the interleukin-2-dependent STAT5ab signaling pathway in modulating T cell function during pregnancy. These findings unravel the precise timing of immunological events occurring during a term pregnancy and provide the analytical framework to identify immunological deviations implicated in pregnancy-related pathologies.

...read moreread less

330 citations

Posted Content•

Extended Comparisons of Best Subset Selection, Forward Stepwise Selection, and the Lasso

[...]

Trevor Hastie, Robert Tibshirani, Ryan J. Tibshirani

27 Jul 2017-arXiv: Methodology

TL;DR: An expanded set of simulations showed that the classical best subset selection problem in regression modeling can be formulated as a mixed integer optimization (MIO) problem, and that the relaxed lasso is the overall winner, performing just about as well as the lasso in low SNR scenarios, and as much asbest subset selection in highSNR scenarios.

...read moreread less

Abstract: In exciting new work, Bertsimas et al. (2016) showed that the classical best subset selection problem in regression modeling can be formulated as a mixed integer optimization (MIO) problem. Using recent advances in MIO algorithms, they demonstrated that best subset selection can now be solved at much larger problem sizes that what was thought possible in the statistics community. They presented empirical comparisons of best subset selection with other popular variable selection procedures, in particular, the lasso and forward stepwise selection. Surprisingly (to us), their simulations suggested that best subset selection consistently outperformed both methods in terms of prediction accuracy. Here we present an expanded set of simulations to shed more light on these comparisons. The summary is roughly as follows: (a) neither best subset selection nor the lasso uniformly dominate the other, with best subset selection generally performing better in high signal-to-noise (SNR) ratio regimes, and the lasso better in low SNR regimes; (b) best subset selection and forward stepwise perform quite similarly throughout; (c) the relaxed lasso (actually, a simplified version of the original relaxed estimator defined in Meinshausen, 2007) is the overall winner, performing just about as well as the lasso in low SNR scenarios, and as well as best subset selection in high SNR scenarios.

...read moreread less

171 citations

Journal Article•DOI•

Diagnosis of Prostate Cancer by Desorption Electrospray Ionization Mass Spectrometric Imaging of Small Metabolites and Lipids

[...]

Shibdas Banerjee¹, Richard N. Zare¹, Robert Tibshirani¹, Christian A. Kunder¹, Rosalie Nolley¹, Richard E. Fan¹, James D. Brooks¹, Geoffrey A. Sonn¹ - Show less +4 more•Institutions (1)

Stanford University¹

28 Mar 2017-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: Measurement of the glucose/citrate ion signal ratio accurately predicted cancer when this ratio exceeds 1.0 and normal prostate when the ratio is less than 0.5, indicated that the ratio of glucose to citrate ion signals could be used to accurately identify prostate cancer.

...read moreread less

Abstract: Accurate identification of prostate cancer in frozen sections at the time of surgery can be challenging, limiting the surgeon's ability to best determine resection margins during prostatectomy. We performed desorption electrospray ionization mass spectrometry imaging (DESI-MSI) on 54 banked human cancerous and normal prostate tissue specimens to investigate the spatial distribution of a wide variety of small metabolites, carbohydrates, and lipids. In contrast to several previous studies, our method included Krebs cycle intermediates (m/z <200), which we found to be highly informative in distinguishing cancer from benign tissue. Malignant prostate cells showed marked metabolic derangements compared with their benign counterparts. Using the "Least absolute shrinkage and selection operator" (Lasso), we analyzed all metabolites from the DESI-MS data and identified parsimonious sets of metabolic profiles for distinguishing between cancer and normal tissue. In an independent set of samples, we could use these models to classify prostate cancer from benign specimens with nearly 90% accuracy per patient. Based on previous work in prostate cancer showing that glucose levels are high while citrate is low, we found that measurement of the glucose/citrate ion signal ratio accurately predicted cancer when this ratio exceeds 1.0 and normal prostate when the ratio is less than 0.5. After brief tissue preparation, the glucose/citrate ratio can be recorded on a tissue sample in 1 min or less, which is in sharp contrast to the 20 min or more required by histopathological examination of frozen tissue specimens.

...read moreread less

154 citations

Journal Article•DOI•

Metabolic Markers and Statistical Prediction of Serous Ovarian Cancer Aggressiveness by Ambient Ionization Mass Spectrometry Imaging

[...]

Marta Sans¹, Kshipra M. Gharpure², Robert Tibshirani³, Jialing Zhang¹, Li Liang², Jinsong Liu², Jonathan H. Young¹, R.L. Dood², Anil K. Sood, Livia S. Eberlin¹ - Show less +6 more•Institutions (3)

University of Texas at Austin¹, University of Texas MD Anderson Cancer Center², Stanford University³

01 Jun 2017-Cancer Research

TL;DR: Desorption electrospray ionization (DESI) mass spectrometry (MS) was used to image and chemically characterize the metabolic profiles of HGSC, BOT, and normal ovarian tissue samples and suggest DESI-MS as a powerful approach for rapid serous ovarian cancer diagnosis based on altered metabolic signatures.

...read moreread less

Abstract: Ovarian high-grade serous carcinoma (HGSC) results in the highest mortality among gynecological cancers, developing rapidly and aggressively. Dissimilarly, serous borderline ovarian tumors (BOT) can progress into low-grade serous carcinomas and have relatively indolent clinical behavior. The underlying biological differences between HGSC and BOT call for accurate diagnostic methodologies and tailored treatment options, and identification of molecular markers of aggressiveness could provide valuable biochemical insights and improve disease management. Here, we used desorption electrospray ionization (DESI) mass spectrometry (MS) to image and chemically characterize the metabolic profiles of HGSC, BOT, and normal ovarian tissue samples. DESI-MS imaging enabled clear visualization of fine papillary branches in serous BOT and allowed for characterization of spatial features of tumor heterogeneity such as adjacent necrosis and stroma in HGSC. Predictive markers of cancer aggressiveness were identified, including various free fatty acids, metabolites, and complex lipids such as ceramides, glycerophosphoglycerols, cardiolipins, and glycerophosphocholines. Classification models built from a total of 89,826 individual pixels, acquired in positive and negative ion modes from 78 different tissue samples, enabled diagnosis and prediction of HGSC and all tumor samples in comparison with normal tissues, with overall agreements of 96.4% and 96.2%, respectively. HGSC and BOT discrimination was achieved with an overall accuracy of 93.0%. Interestingly, our classification model allowed identification of three BOT samples presenting unusual histologic features that could be associated with the development of low-grade carcinomas. Our results suggest DESI-MS as a powerful approach for rapid serous ovarian cancer diagnosis based on altered metabolic signatures. Cancer Res; 77(11); 2903-13. ©2017 AACR.

...read moreread less

91 citations

Journal Article•DOI•

Landscape of monoallelic DNA accessibility in mouse embryonic stem cells and neural progenitor cells.

[...]

Jin Xu¹, Ava C. Carter¹, Anne-Valerie Gendrel², Mikael Attia², Joshua R. Loftus¹, William J. Greenleaf¹, Robert Tibshirani¹, Edith Heard², Howard Y. Chang¹ - Show less +5 more•Institutions (2)

Stanford University¹, PSL Research University²

23 Jan 2017-Nature Genetics

TL;DR: Quantitative analysis indicated that allelic choice at the majority of RAMA elements is consistent with a stochastic process; however, up to 30% of RAMC elements may deviate from the expected pattern, suggesting a regulated or counting mechanism.

...read moreread less

Abstract: Howard Chang and colleagues use allele-specific ATAC–seq to profile active regulatory DNA across the genome in mouse embryonic stem cells and neural progenitor cells. They find that monoallelic DNA accessibility across autosomes is pervasive, developmentally programmed and composed of several patterns.

...read moreread less

78 citations

Journal Article•DOI•

A proteomic clock of human pregnancy.

[...]

Nima Aghaeepour¹, Benoit Lehallier¹, Quentin Baca¹, Edward A. Ganio¹, Ronald J. Wong¹, Mohammad Sajjad Ghaemi¹, Anthony Culos¹, Yasser Y. El-Sayed¹, Yair J. Blumenfeld¹, Maurice L. Druzin¹, Virginia D. Winn¹, Ronald S. Gibbs¹, Robert Tibshirani¹, Gary M. Shaw¹, David K. Stevenson¹, Brice Gaudilliere¹, Martin S. Angst¹ - Show less +13 more•Institutions (1)

Stanford University¹

01 Dec 2017-American Journal of Obstetrics and Gynecology

TL;DR: Results indicate that precisely timed changes in the plasma proteome during term pregnancy mirror a proteomic clock, and the exciting promise of such a clock is that deviations from its regular chronological profile may assist in the early diagnoses of pregnancy‐related pathologies, and point to underlying pathophysiology.

...read moreread less

75 citations

Journal Article•DOI•

Selecting the number of principal components: Estimation of the true rank of a noisy matrix

[...]

Yunjin Choi, Jonathan Taylor, Robert Tibshirani

01 Dec 2017-Annals of Statistics

TL;DR: In this article, the authors proposed distribution-based methods with exact type 1 error controls for hypothesis testing and construction of confidence intervals for signals in a noisy matrix with finite samples, assuming Gaussian noise, by utilizing a post-selection inference framework, and extending the approach of Taylor, Loftus and Tibshirani (2013) in a PCA setting.

...read moreread less

Abstract: Principal component analysis (PCA) is a well-known tool in multivariate statistics. One significant challenge in using PCA is the choice of the number of principal components. In order to address this challenge, we propose distribution-based methods with exact type 1 error controls for hypothesis testing and construction of confidence intervals for signals in a noisy matrix with finite samples. Assuming Gaussian noise, we derive exact type 1 error controls based on the conditional distribution of the singular values of a Gaussian matrix by utilizing a post-selection inference framework, and extending the approach of [Taylor, Loftus and Tibshirani (2013)] in a PCA setting. In simulation studies, we find that our proposed methods compare well to existing approaches.

...read moreread less

72 citations

Journal Article•DOI•

Chemical Space Mimicry for Drug Discovery.

[...]

William Yuan¹, Dadi Jiang², Dhanya Nambiar², Lydia P. Liew³, Michael P. Hay³, Joshua Bloomstein², Peter S. Lu², Brandon E. Turner², Quynh-Thu Le², Robert Tibshirani², Purvesh Khatri², Mark G. Moloney¹, Albert C. Koong² - Show less +9 more•Institutions (3)

University of Oxford¹, Stanford University², University of Auckland³

03 Apr 2017-Journal of Chemical Information and Modeling

TL;DR: Newly identified MIMICS-generated compounds were found to be bioactive as inhibitors of specific components of the unfolded protein response and the VEGFR2 pathway in cell-based assays, thus confirming the applicability of this methodology toward drug design applications.

...read moreread less

Abstract: We describe a new library generation method, Machine-based Identification of Molecules Inside Characterized Space (MIMICS), that generates sets of molecules inspired by a text-based input. MIMICS-generated libraries were found to preserve distributions of properties while simultaneously increasing structural diversity. Newly identified MIMICS-generated compounds were found to be bioactive as inhibitors of specific components of the unfolded protein response (UPR) and the VEGFR2 pathway in cell-based assays, thus confirming the applicability of this methodology toward drug design applications. Wider application of MIMICS could facilitate the efficient utilization of chemical space.

...read moreread less

64 citations

Journal Article•DOI•

Long-term course of patients with primary ocular adnexal MALT lymphoma: a large single-institution cohort study.

[...]

Amrita Desai¹, Madhura Joag², Lazaros J. Lekakis¹, Jennifer R. Chapman¹, Francisco Vega¹, Robert Tibshirani³, David T. Tse², Arnold M. Markoe, Izidore S. Lossos¹ - Show less +5 more•Institutions (3)

University of Miami¹, Bascom Palmer Eye Institute², Stanford University³

19 Jan 2017-Blood

TL;DR: It is demonstrated that POAMLs harbor a persistent and ongoing risk of relapse, including in the central nervous system, and transformation to aggressive lymphoma (4%), requiring long-term follow-up.

...read moreread less

52 citations

Journal Article•DOI•

Big data modeling to predict platelet usage and minimize wastage in a tertiary care system

[...]

Leying Guan¹, Xiaoying Tian¹, Saurabh Gombar¹, Allison Zemek¹, Gomathi Krishnan¹, Robert A. Scott¹, Balasubramanian Narasimhan¹, Robert Tibshirani¹, Tho D. Pham¹ - Show less +5 more•Institutions (1)

Stanford University¹

24 Oct 2017-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: A statistical model is demonstrated using hospital patient data to quantitatively forecast, days in advance, the need for platelet transfusions, and this approach can be leveraged to significantly decrease platelet wastage, and, if adopted nationwide, would save approximately 80 million dollars per year.

...read moreread less

Abstract: Maintaining a robust blood product supply is an essential requirement to guarantee optimal patient care in modern health care systems. However, daily blood product use is difficult to anticipate. Platelet products are the most variable in daily usage, have short shelf lives, and are also the most expensive to produce, test, and store. Due to the combination of absolute need, uncertain daily demand, and short shelf life, platelet products are frequently wasted due to expiration. Our aim is to build and validate a statistical model to forecast future platelet demand and thereby reduce wastage. We have investigated platelet usage patterns at our institution, and specifically interrogated the relationship between platelet usage and aggregated hospital-wide patient data over a recent consecutive 29-mo period. Using a convex statistical formulation, we have found that platelet usage is highly dependent on weekday/weekend pattern, number of patients with various abnormal complete blood count measurements, and location-specific hospital census data. We incorporated these relationships in a mathematical model to guide collection and ordering strategy. This model minimizes waste due to expiration while avoiding shortages; the number of remaining platelet units at the end of any day stays above 10 in our model during the same period. Compared with historical expiration rates during the same period, our model reduces the expiration rate from 10.5 to 3.2%. Extrapolating our results to the ∼2 million units of platelets transfused annually within the United States, if implemented successfully, our model can potentially save ∼80 million dollars in health care costs.

...read moreread less

45 citations

Journal Article•DOI•

Post‐selection point and interval estimation of signal sizes in Gaussian samples

[...]

Stephen Reid¹, Jonathan Taylor¹, Robert Tibshirani¹•Institutions (1)

Stanford University¹

01 Jun 2017-Canadian Journal of Statistics-revue Canadienne De Statistique

TL;DR: In this paper, the authors adapt recent developments by Lee et al. in post selection inference for the Lasso to the orthogonal setting, where sample elements have different underlying signal sizes.

...read moreread less

Abstract: We tackle the problem of the estimation of a vector of means from a single vector-valued observation $y$. Whereas previous work reduces the size of the estimates for the largest (absolute) sample elements via shrinkage (like James-Stein) or biases estimated via empirical Bayes methodology, we take a novel approach. We adapt recent developments by Lee et al (2013) in post selection inference for the Lasso to the orthogonal setting, where sample elements have different underlying signal sizes. This is exactly the setup encountered when estimating many means. It is shown that other selection procedures, like selecting the $K$ largest (absolute) sample elements and the Benjamini-Hochberg procedure, can be cast into their framework, allowing us to leverage their results. Point and interval estimates for signal sizes are proposed. These seem to perform quite well against competitors, both recent and more tenured. Furthermore, we prove an upper bound to the worst case risk of our estimator, when combined with the Benjamini-Hochberg procedure, and show that it is within a constant multiple of the minimax risk over a rich set of parameter spaces meant to evoke sparsity.

...read moreread less

Posted Content•

Synth-Validation: Selecting the Best Causal Inference Method for a Given Dataset

[...]

Alejandro Schuler, Kenneth Jung, Robert Tibshirani, Trevor Hastie, Nigam H. Shah - Show less +1 more

31 Oct 2017-arXiv: Machine Learning

TL;DR: This work proposes synth-validation, a procedure that estimates the estimation error of causal inference methods applied to a given dataset and applies each causal inference method to datasets sampled from these distributions and compares the effect estimates with the known effects to estimate error.

...read moreread less

Abstract: Many decisions in healthcare, business, and other policy domains are made without the support of rigorous evidence due to the cost and complexity of performing randomized experiments. Using observational data to answer causal questions is risky: subjects who receive different treatments also differ in other ways that affect outcomes. Many causal inference methods have been developed to mitigate these biases. However, there is no way to know which method might produce the best estimate of a treatment effect in a given study. In analogy to cross-validation, which estimates the prediction error of predictive models applied to a given dataset, we propose synth-validation, a procedure that estimates the estimation error of causal inference methods applied to a given dataset. In synth-validation, we use the observed data to estimate generative distributions with known treatment effects. We apply each causal inference method to datasets sampled from these distributions and compare the effect estimates with the known effects to estimate error. Using simulations, we show that using synth-validation to select a causal inference method for each study lowers the expected estimation error relative to consistently using any single method.

...read moreread less

Posted Content•

A Pliable Lasso

[...]

Robert Tibshirani¹, Jerome H. Friedman¹•Institutions (1)

Stanford University¹

01 Dec 2017-arXiv: Methodology

TL;DR: This article proposed a generalization of the lasso that allows the model coefficients to vary as a function of a general set of modifying variables, such as gender, age or time, and presented a computationally efficient algorithm for its optimization.

...read moreread less

Abstract: We propose a generalization of the lasso that allows the model coefficients to vary as a function of a general set of modifying variables. These modifiers might be variables such as gender, age or time. The paradigm is quite general, with each lasso coefficient modified by a sparse linear function of the modifying variables $Z$. The model is estimated in a hierarchical fashion to control the degrees of freedom and avoid overfitting. The modifying variables may be observed, observed only in the training set, or unobserved overall. There are connections of our proposal to varying coefficient models and high-dimensional interaction models. We present a computationally efficient algorithm for its optimization, with exact screening rules to facilitate application to large numbers of predictors. The method is illustrated on a number of different simulated and real examples.

...read moreread less

Journal Article•DOI•

A simple method for analyzing matched designs with double controls: McNemar's test can be extended.

[...]

Donald A. Redelmeier, Robert Tibshirani¹•Institutions (1)

Stanford University¹

01 Jan 2017-Journal of Clinical Epidemiology

TL;DR: The new approach provides a feasible, simple, and efficient method for analyzing matched designs with double controls and agrees closely with conditional logistic regression and are sufficiently simple as to be computed on a handheld calculator.

...read moreread less

Posted Content•

Some methods for heterogeneous treatment effect estimation in high-dimensions

[...]

Scott Powers¹, Junyang Qian¹, Kenneth Jung¹, Alejandro Schuler¹, Nigam H. Shah¹, Trevor Hastie¹, Robert Tibshirani¹ - Show less +3 more•Institutions (1)

Stanford University¹

01 Jul 2017-arXiv: Machine Learning

TL;DR: In this paper, the authors proposed and analyzed three methods for estimating heterogeneous treatment effects using observational data and applied them to data from a large randomized trial of a treatment for high blood pressure.

...read moreread less

Abstract: When devising a course of treatment for a patient, doctors often have little quantitative evidence on which to base their decisions, beyond their medical education and published clinical trials. Stanford Health Care alone has millions of electronic medical records (EMRs) that are only just recently being leveraged to inform better treatment recommendations. These data present a unique challenge because they are high-dimensional and observational. Our goal is to make personalized treatment recommendations based on the outcomes for past patients similar to a new patient. We propose and analyze three methods for estimating heterogeneous treatment effects using observational data. Our methods perform well in simulations using a wide variety of treatment effect functions, and we present results of applying the two most promising methods to data from The SPRINT Data Analysis Challenge, from a large randomized trial of a treatment for high blood pressure.

...read moreread less

Posted Content•

Sparse canonical correlation analysis

[...]

Xiaotong Suo, Victor Minden, Bradley J. Nelson, Robert Tibshirani, Michael A. Saunders - Show less +1 more

30 May 2017-arXiv: Machine Learning

TL;DR: This work proposes a sparse canonical correlation analysis by adding l1 constraints on the canonical vectors and shows how to solve it efficiently using linearized alternating direction method of multipliers (ADMM) and using TFOCS as a black box.

...read moreread less

Abstract: Canonical correlation analysis was proposed by Hotelling [6] and it measures linear relationship between two multidimensional variables In high dimensional setting, the classical canonical correlation analysis breaks down We propose a sparse canonical correlation analysis by adding l1 constraints on the canonical vectors and show how to solve it efficiently using linearized alternating direction method of multipliers (ADMM) and using TFOCS as a black box We illustrate this idea on simulated data

...read moreread less

Journal Article•DOI•

Log-ratio Lasso: Scalable, Sparse Estimation for Log-ratio Models

[...]

Stephen Bates¹, Robert Tibshirani¹•Institutions (1)

Stanford University¹

04 Sep 2017-arXiv: Methodology

TL;DR: An embedding of the log‐ratio parameter space into a space of much lower dimension is introduced and used as the foundation for a two‐step fitting procedure that combines a convex filtering step with a second non‐convex pruning step to yield highly sparse solutions.

...read moreread less

Abstract: Positive-valued signal data is common in many biological and medical applications, where the data are often generated from imaging techniques such as mass spectrometry. In such a setting, the relative intensities of the raw features are often the scientifically meaningful quantities, so it is of interest to identify relevant features that take the form of log-ratios of the raw inputs. When including the log-ratios of all pairs of predictors, the dimensionality of this predictor space becomes large, so computationally efficient statistical procedures are required. We introduce an embedding of the log-ratio parameter space into a space of much lower dimension and develop efficient penalized fitting procedure using this more tractable representation. This procedure serves as the foundation for a two-step fitting procedure that combines a convex filtering step with a second non-convex pruning step to yield highly sparse solutions. On a cancer proteomics data set we find that these methods fit highly sparse models with log-ratio features of known biological relevance while greatly improving upon the predictive accuracy of less interpretable methods.

...read moreread less

Journal Article•DOI•

KLHL6 Is Preferentially Expressed in Germinal Center–Derived B-Cell Lymphomas

[...]

Christian A. Kunder¹, Giovanna Roncador², Ranjana H. Advani¹, Gabriela Gualco, Carlos E. Bacchi, Jean Sabile¹, Izidore S. Lossos³, Kexin Nie¹, Robert Tibshirani¹, Michael R. Green⁴, Ash A. Alizadeh¹, Yasodha Natkunam¹ - Show less +8 more•Institutions (4)

Stanford University¹, Carlos III Health Institute², University of Miami³, University of Texas MD Anderson Cancer Center⁴

20 Nov 2017-American Journal of Clinical Pathology

TL;DR: KLHL6 immunohistochemistry may prove a useful adjunct in the diagnosis and future classification of B-cell lymphomas.

...read moreread less

Abstract: Objectives KLHL6 is a recently described BTB-Kelch protein with selective expression in lymphoid tissues and is most strongly expressed in germinal center B cells. Methods Using gene expression profiling as well as immunohistochemistry with an anti-KLHL6 monoclonal antibody, we have characterized the expression of this molecule in normal and neoplastic tissues. Protein expression was evaluated in 1,058 hematopoietic neoplasms. Results Consistent with its discovery as a germinal center marker, KLHL6 was positive mainly in B-cell neoplasms of germinal center derivation, including 95% of follicular lymphomas (106/112). B-cell lymphomas of non-germinal center derivation were generally negative (0/33 chronic lymphocytic leukemias/small lymphocytic lymphomas, 3/49 marginal zone lymphomas, and 2/66 mantle cell lymphomas). Conclusions In addition to other germinal center markers, including BCL6, CD10, HGAL, and LMO2, KLHL6 immunohistochemistry may prove a useful adjunct in the diagnosis and future classification of B-cell lymphomas.

...read moreread less

Posted Content•

Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball

[...]

Scott Powers¹, Trevor Hastie¹, Robert Tibshirani¹•Institutions (1)

Stanford University¹

30 Jun 2017-arXiv: Machine Learning

TL;DR: The authors proposed the nuclear norm penalty as an alternative to the ridge penalty for regularized multinomial regression, which has the advantage of leveraging underlying structure among the response categories to make better predictions.

...read moreread less

Abstract: We propose the nuclear norm penalty as an alternative to the ridge penalty for regularized multinomial regression. This convex relaxation of reduced-rank multinomial regression has the advantage of leveraging underlying structure among the response categories to make better predictions. We apply our method, nuclear penalized multinomial regression (NPMR), to Major League Baseball play-by-play data to predict outcome probabilities based on batter-pitcher matchups. The interpretation of the results meshes well with subject-area expertise and also suggests a novel understanding of what differentiates players.

...read moreread less

Journal Article•DOI•

Development of a Dynamic Model for Personalized Risk Assessment in Large B-Cell Lymphoma

[...]

David M. Kurtz¹, Florian Scherer¹, Michael C. Jin¹, Joanne Soo¹, Alexander F.M. Craig¹, Mohammad Shahrokh Esfahani¹, Jacob J. Chabon¹, Henning Stehr¹, Chih Long Liu¹, Robert Tibshirani¹, Lauren S. Maeda¹, Neel K. Gupta¹, Michael S. Khodadoust¹, Ranjana H. Advani¹, Ronald Levy¹, Aaron M. Newman¹, Jason R. Westin², Gianluca Gaidano³, Davide Rossi, Maximilian Diehn¹, Ash A. Alizadeh¹ - Show less +17 more•Institutions (3)

Stanford University¹, University of Texas MD Anderson Cancer Center², University of Eastern Piedmont³

07 Dec 2017-Blood

TL;DR: Baseline and interim ctDNA measurements have prognostic significance in aggressive lymphomas and are integrated with established risk-factors to develop a model to predict an individual9s disease risk.

...read moreread less

DOI•

Glmnet in Python

[...]

B. J. Balakumar, Han Fang, Trevor Hastie, Jerome H. Friedman, Robert Tibshirani, Noah Simon - Show less +2 more

10 Apr 2017