scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.

TL;DR: The expanded CMap is reported, made possible by a new, low-cost, high-throughput reduced representation expression profiling method that is shown to be highly reproducible, comparable to RNA sequencing, and suitable for computational inference of the expression levels of 81% of non-measured transcripts.
About: This article is published in Cell.The article was published on 2017-11-30 and is currently open access. It has received 1943 citations till now.
Citations
More filters
Journal ArticleDOI
03 Sep 2021-iScience
TL;DR: In this paper, the authors developed and validated a computational method to select synergistic compound combinations based on transcriptomic profiles from both the disease and compound side, combined with a pathway scoring system, which was then validated prospectively by testing 30 compounds and their combinations on PANC-1 cells.

2 citations

Posted Content
TL;DR: In this paper, the authors developed an infinite width neural network framework for matrix completion that is simple, fast, and flexible, which is based on the connection between the infinite width limit of neural networks and kernels known as neural tangent kernels (NTK).
Abstract: Matrix completion problems arise in many applications including recommendation systems, computer vision, and genomics. Increasingly larger neural networks have been successful in many of these applications, but at considerable computational costs. Remarkably, taking the width of a neural network to infinity allows for improved computational performance. In this work, we develop an infinite width neural network framework for matrix completion that is simple, fast, and flexible. Simplicity and speed come from the connection between the infinite width limit of neural networks and kernels known as neural tangent kernels (NTK). In particular, we derive the NTK for fully connected and convolutional neural networks for matrix completion. The flexibility stems from a feature prior, which allows encoding relationships between coordinates of the target matrix, akin to semi-supervised learning. The effectiveness of our framework is demonstrated through competitive results for virtual drug screening and image inpainting/reconstruction. We also provide an implementation in Python to make our framework accessible on standard hardware to a broad audience.

2 citations

Journal ArticleDOI
TL;DR: In this article, a glycoproteome-wide search for GalNAc-T14 substrates using lectin affinity chromatography followed by tandem mass spectrometry was conducted, which revealed that tenofovir was a major negative regulator of the N-Acetylgalactosaminyltransferase 14 (GALNT14) gene.
Abstract: Sorafenib is a first-line treatment for patients with advanced hepatocellular carcinoma (HCC). These patients may simultaneously receive anti-hepatitis B treatment if they are viremic. The N-Acetylgalactosaminyltransferase 14 (GALNT14) gene can serve as a biomarker to guide HCC treatments. However, the enzyme substrates of its gene product, GalNAc-T14 (a glycosyltransferase), remained uncharacterized. Here, we conducted a glycoproteome-wide search for GalNAc-T14 substrates using lectin affinity chromatography followed by tandem mass spectrometry. Seventeen novel GalNAc-T14 substrates were identified. A connective map analysis showed that an antiviral drug, tenofovir, was the leading medicinal compound to down-regulate the expression of these substrates. In vitro assays showed that HCC cells were resistant to sorafenib if pretreated by tenofovir but not entecavir. Clinical analysis showed that the concomitant use of tenofovir and sorafenib was a previously unrecognized predictive factor for unfavorable overall survival (hazard ratio = 2.060, 95% confidence interval = [1.256, 3.381], p = 0.004) in a cohort of 181 hepatitis-B-related, sorafenib-treated HCC patients (concomitant tenofovir versus entecavir treatment; p = 0.003). In conclusion, by conducting a glycoproteome-wide search for GalNAc-T14 substrates, we unexpectedly found that tenofovir was a major negative regulator of GalNAc-T14 substrates and an unfavorable anti-hepatitis B drug in HCC patients receiving sorafenib.

2 citations

Journal ArticleDOI
01 Mar 2023
TL;DR: In this article , the authors used drug repurposing approaches to discover small molecules that regulate the formation of definitive endoderm, which can be used to improve the performance of stem cell differentiation.
Abstract: •A new drug repurposing pipeline designed to identify compounds with desired traits•Identified definitive endoderm differentiation inducers, reducing growth factor needs•A list of potential inducers for future experimental screens was compiled•Endoderm-mesoderm single-cell RNA-seq reveals potential drug targets Improving methods for human embryonic stem cell differentiation represents a challenge in modern regenerative medicine research. Using drug repurposing approaches, we discover small molecules that regulate the formation of definitive endoderm. Among them are inhibitors of known processes involved in endoderm differentiation (mTOR, PI3K, and JNK pathways) and a new compound, with an unknown mechanism of action, capable of inducing endoderm formation in the absence of growth factors in the media. Optimization of the classical protocol by inclusion of this compound achieves the same differentiation efficiency with a 90% cost reduction. The presented in silico procedure for candidate molecule selection has broad potential for improving stem cell differentiation protocols. Improving methods for human embryonic stem cell differentiation represents a challenge in modern regenerative medicine research. Using drug repurposing approaches, we discover small molecules that regulate the formation of definitive endoderm. Among them are inhibitors of known processes involved in endoderm differentiation (mTOR, PI3K, and JNK pathways) and a new compound, with an unknown mechanism of action, capable of inducing endoderm formation in the absence of growth factors in the media. Optimization of the classical protocol by inclusion of this compound achieves the same differentiation efficiency with a 90% cost reduction. The presented in silico procedure for candidate molecule selection has broad potential for improving stem cell differentiation protocols. Over 400 million people worldwide live with diabetes, a chronic disease caused by insulin deficiency or resistance. A promising emerging treatment approach uses human embryonic stem cell (hESC)-derived insulin-producing cells to replace lost or dysfunctional patient cells. Current protocols for generating insulin-producing cells mimic pancreatic β cells’ developmental specification by differentiating hESCs into primitive streak (PS), an intermediate fate branchpoint (Davis et al., 2008Davis R.P. Ng E.S. Costa M. Mossman A.K. Targeting a GFP reporter gene to the MIXL1 locus of human embryonic stem cells identifies human primitive streak–like cells and enables isolation of primitihematopoietic precursors.Blood. 2008; 111: 1876-1884https://ashpublications.org/blood/article/111/4/1876/133330Crossref PubMed Scopus (192) Google Scholar), and then definitive endoderm (DE), a crucial pancreatic precursor (Mahaddalkar et al., 2020Mahaddalkar P.U. Scheibner K. Pfluger S. Ansarullah M.S. Sterr M. Beckenbauer J. Irmler M. Beckers J. Knöbel S. Lickert H. Generation of pancreatic β cells from CD177+ anterior definitive endoderm.Nat. Biotechnol. 2020; 38: 1061-1072Crossref PubMed Scopus (44) Google Scholar) (Figure S1A). Activation of the transforming growth factor β (TGF-β) and WNT pathways through extrinsic signals and inhibition of the PI3K/mTOR pathway lead hESCs to the PS branchpoint. These well-studied processes have known gene markers (Table S1). Improving differentiation protocols for DE from hESCs remains a research focus, with some protocols using CHIR99021 (CHIR), a WNT inducer, and the growth factor activin A (AA), a TGF-β inducer (Loh et al., 2014Loh K.M. Ang L.T. Zhang J. Kumar V. Ang J. Auyeong J.Q. Lee K.L. Choo S.H. Lim C.Y.Y. Nichane M. et al.Efficient Endoderm Induction from Human Pluripotent Stem Cells by Logically Directing Signals Controlling Lineage Bifurcations.Cell Stem Cell. 2014; 14: 237-252Abstract Full Text Full Text PDF PubMed Scopus (256) Google Scholar; Naujok et al., 2014Naujok O. Diekmann U. Lenzen S. The generation of definitive endoderm from human embryonic stem cells is initially independent from activin A but requires canonical wnt-signaling.Stem Cell Rev. Rep. 2014; 10: 480-493https://doi.org/10.1007/s12015-014-9509-0Crossref PubMed Scopus (47) Google Scholar) (Figure S1B). The complex manufacturing process and high cost of growth factors like AA makes current DE differentiation protocols expensive. Small molecules are ideal replacements: they are more stable, easier to store, allow for greater specificity, and have greater activity and reproducibility (Pan and Liu, 2019Pan G. Liu J. Small molecules and extrinsic factors promoting differentiation of stem cells into insulin-producing cells.Ann. Endocrinol. 2019; 80: 128-133Crossref PubMed Scopus (5) Google Scholar). Small molecules can successfully induce differentiation, even in the absence of growth factors (Korostylev et al., 2017Korostylev A. Mahaddalkar P.U. Keminer O. Hadian K. Schorpp K. Gribbon P. Lickert H. A high-content small molecule screen identifies novel inducers of definitive endoderm.Mol. Metab. 2017; 6: 640-650Crossref PubMed Scopus (22) Google Scholar; Borowiak et al., 2009Borowiak M. Maehr R. Chen S. Chen A.E. Tang W. Fox J.L. Schreiber S.L. Melton D.A. Small molecules efficiently direct endodermal differentiation of mouse and human embryonic stem cells.Cell Stem Cell. 2009; 4: 348-358Abstract Full Text Full Text PDF PubMed Scopus (351) Google Scholar). However, small molecules rarely act on one single target, which can cause undesired off-target effects, and their discovery, for instance by high-throughput screening, is an expensive and time-consuming process (Waring et al., 2015Waring M.J. Arrowsmith J. Leach A.R. Leeson P.D. Mandrell S. Owen R.M. Pairaudeau G. Pennie W.D. Pickett S.D. Wang J. et al.An analysis of the attrition of drug candidates from four major pharmaceutical companies.Nat. Rev. Drug Discov. 2015; 14: 475-486Crossref PubMed Scopus (799) Google Scholar). Drug repurposing is an alternative method for identifying compounds with desired properties (Pushpakom et al., 2019Pushpakom S. Iorio F. Eyers P.A. Escott K.J. Hopper S. Wells A. Doig A. Guilliams T. Latimer J. McNamee C. et al.Drug repurposing: progress, challenges and recommendations.Nat. Rev. Drug Discov. 2019; 18: 41-58Crossref PubMed Scopus (1905) Google Scholar). The Connectivity Map (CMap) is a catalog of induced gene expression signatures for thousands of compounds (Lamb et al., 2006Lamb J. Crawford E.D. Peck D. Modell J.W. Blat I.C. Wrobel M.J. Lerner J. Brunet J.P. Subramanian A. Ross K.N. et al.The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease.Science. 2006; 313: 1929-1935Crossref PubMed Scopus (3614) Google Scholar). It provides a platform for identifying potential inducers/inhibitors of a process of interest based on the similarity of the expression signatures of the process and those of compounds in the database. The application of CMap for drug repositioning has been demonstrated in different fields (Liu et al., 2015aLiu J. Lee J. Salazar Hernandez M.A. Mazitschek R. Ozcan U. Treatment of obesity with celastrol.Cell. 2015; 161: 999-1011Abstract Full Text Full Text PDF PubMed Scopus (480) Google Scholar; Zhang et al., 2015Zhang M. Luo H. Xi Z. Rogaeva E. Drug repositioning for diabetes based on ‘omics’ data mining.PLoS One. 2015; 10: e0126082PubMed Google Scholar; Dyle et al., 2014Dyle M.C. Ebert S.M. Cook D.P. Fox D.K. Bongers K.S. Bullard S.A. Dierdorff J.M. Adams C.M. Kunkel S.D. Systems-based discovery of tomatidine as a natural small molecule inhibitor of skeletal muscle atrophy.J. Biol. Chem. 2014; 289: 14913-14924https://doi.org/10.1074/jbc.m114.556241Abstract Full Text Full Text PDF PubMed Scopus (0) Google Scholar; Brum et al., 2018Brum A.M. van de Peppel J. Nguyen L. Aliev A. Schreuders-Koedam M. Gajadien T. van der Leije C.S. van Kerkwijk A. Eijken M. van Leeuwen J.P.T.M. et al.Using the connectivity map to discover compounds influencing human osteoblast differentiation.J. Cell. Physiol. 2018; 233: 4895-4906Crossref PubMed Scopus (20) Google Scholar). Recent studies increasingly utilize the Library of Integrated Network-based Cellular Signatures (LINCS) database, an expansion of the CMap catalog, including >500,000 gene expression signatures from the screening of >20,000 small molecules across 99 different cell lines (Subramanian et al., 2017Subramanian A. Narayan R. Corsello S.M. Peck D.D. Natoli T.E. Lu X. Gould J. Davis J.F. Tubelli A.A. Asiedu J.K. et al.A next generation connectivity map: L1000 platform and the first 1,000,000 profiles.Cell. 2017; 171: 1437-1452.e17Abstract Full Text Full Text PDF PubMed Scopus (1382) Google Scholar). Here, using DE differentiation as a model developmental process, we describe three in silico screening approaches to discover small molecules that can be used for directed differentiation. To accomplish this, we used transcriptomic profiles of AA-based DE differentiation or pathway and transcription factor (TF) target enrichment to query the CMap/LINCS catalogs and identify candidate AA replacements. We tested the ability of a subset of these candidates to drive endoderm differentiation from hESCs. This approach presents an efficient alternate means to optimizing stem cell differentiation protocols. To identify potential endoderm inducers, we built a profile of gene expression changes of hESC differentiation into DE (Figure S1A). We analyzed two publicly available bulk RNA sequencing (RNA-seq) time-series datasets that capture gene expression changes in hESCs after the addition of AA to the media (Figure S2A), GEO: GSE75748 (Chu et al., 2016Chu L.-F. Leng N. Zhang J. Hou Z. Mamott D. Vereide D.T. Choi J. Kendziorski C. Stewart R. Thomson J.A. Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm.Genome Biol. 2016; 17: 173Crossref PubMed Scopus (203) Google Scholar) (H1 hESCs) and GSE109658 (Lu et al., 2018Lu J. Baccei A. Lummertz da Rocha E. Guillermier C. McManus S. Finney L.A. Zhang C. Steinhauser M.L. Li H. Lerou P.H. Single-cell RNA sequencing reveals metallothionein heterogeneity during hESC differentiation to definitive endoderm.Stem Cell Res. 2018; 28: 48-55Crossref PubMed Scopus (9) Google Scholar) (H9 hESCs). We then performed differential expression analysis at 0 (i.e., pluripotent stage) and 96 h (i.e., DE stage) and compared the differentially expressed genes between the two datasets. Both up- and down-regulated genes significantly overlapped between the two datasets, with 735 and 433 genes, respectively (Table S2). The analysis confirmed not only agreement between the two datasets (Figure S2B) but also highlighted groups of genes with similar and expected expression patterns (Table S1). For instance, the pluripotency markers POU5F1, SOX2, and NANOG showed a steady decrease of expression over time. In contrast, expression of mesoderm markers, such as TBXT, FGF4, or CDX1, peaked at 24 h, declining shortly after, while the expression of endoderm markers including SOX17, GATA6, and EOMES increased over time (Figure 1A ). Notably, the number of differentially expressed genes at 96 h was in the order of thousands, which is consistent with the significant changes that cells undergo in response to AA treatment (Chu et al., 2016Chu L.-F. Leng N. Zhang J. Hou Z. Mamott D. Vereide D.T. Choi J. Kendziorski C. Stewart R. Thomson J.A. Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm.Genome Biol. 2016; 17: 173Crossref PubMed Scopus (203) Google Scholar) (Figure S2C). Next, we explored pathway activities at each time point using single-sample gene set enrichment scores (Hänzelmann et al., 2013Hänzelmann S. Castelo R. Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data.BMC Bioinf. 2013; 14: 7Crossref PubMed Scopus (4623) Google Scholar) (Figures 1B and S3, left panels). For ease of interpretation, estimated activities of Hallmark pathways from the Molecular Signatures Database (MSigDB) (Liberzon et al., 2015Liberzon A. Birger C. Thorvaldsdóttir H. Ghandi M. Mesirov J.P. Tamayo P. The molecular signatures database Hallmark gene set collection.Cell Syst. 2015; 1: 417-425https://doi.org/10.1016/j.cels.2015.12.004Abstract Full Text Full Text PDF PubMed Scopus (4288) Google Scholar) were clustered. DNA repair and spermatogenesis pathways were significantly up-regulated in early stages (Table S3), consistent with previous reports linking cell-cycle-related pathways with pluripotency (Chappell and Dalton, 2013Chappell J. Dalton S. Roles for MYC in the establishment and maintenance of pluripotency.Cold Spring Harb. Perspect. Med. 2013; 3: a014381Crossref PubMed Scopus (81) Google Scholar; Hsu et al., 2019Hsu J. Arand J. Chaikovsky A. Mooney N.A. Demeter J. Brison C.M. Oliverio R. Vogel H. Rubin S.M. Jackson P.K. et al.E2F4 regulates transcriptional activation in mouse embryonic stem cells independently of the RB family.Nat. Commun. 2019; 10: 2939Crossref PubMed Scopus (36) Google Scholar). Similarly, MYC and E2F targets showed a strong initial activity that became negligible in later time points. The following pathways were activated at later time points: p53, apoptosis, hypoxia, epithelial-mesenchymal transition (EMT), PI3K/AKT/mTOR, tumor necrosis factor α (TNF-α) via nuclear factor κB (NF-κB), interleukin-2 (IL-2)/STAT5 signaling, WNT, and TGF-β. All identified pathways were significant (Table S3). The activation of TGF-β, p53, and apoptotic pathways was consistent with expectations (Wang et al., 2017Wang Q. Zou Y. Nowotschin S. Kim S.Y. Li Q.V. Soh C.-L. Su J. Zhang C. Shu W. Xi Q. et al.The p53 family coordinates Wnt and nodal inputs in mesendodermal differentiation of embryonic stem cells.Cell Stem Cell. 2017; 20: 70-86Abstract Full Text Full Text PDF PubMed Scopus (103) Google Scholar; Lanneau et al., 2007Lanneau D. de Thonel A. Maurel S. Didelot C. Garrido C. Apoptosis versus cell differentiation: role of heat shock proteins HSP90, HSP70 and HSP27.Prion. 2007; 1: 53-60Crossref PubMed Scopus (194) Google Scholar). However, activation of the PI3K/AKT/mTOR pathway was surprising, as it has been shown to promote neuroectoderm differentiation (Yu and Cui., 2016Yu J.S.L. Cui. W. Proliferation, survival and metabolism: the role of PI3K/AKT/mTOR signalling in pluripotency and cell fate determination.Development. 2016; 143: 3050-3060Crossref PubMed Scopus (635) Google Scholar), and the roles of NF-κB and IL-2/STAT5 pathways in DE differentiation were unclear. Using the same procedure, we assessed TF activity given the enrichment of their respective targets (Figures 1B and S3, right panels). POU3F and NFI, among others (Table S3), showed enhanced activity in early time points, which is consistent with their roles in neural plate development, possibly indicating the “readiness” of stem cells to adopt a neuroectoderm fate while awaiting an external signal (Iwafuchi-Doi et al., 2011Iwafuchi-Doi M. Yoshida Y. Onichtchouk D. Leichsenring M. Driever W. Takemoto T. Uchikawa M. Kamachi Y. Kondoh H. The Pou5f1/Pou3f-dependent but SoxB-independent regulation of conserved enhancer N2 initiates Sox2 expression during epiblast to neural plate stages in vertebrates.Dev. Biol. 2011; 352: 354-366Crossref PubMed Scopus (57) Google Scholar). In the subsequent 48 h, different TFs became active. Interestingly, receptor-regulated SMADs displayed different patterns of activity, with SMAD3 and its co-regulator SMAD4 exhibiting a stable increase. The activity of TFs GLI1, -2, and -3 increased steadily over time, in agreement with the important role of the Hedgehog signaling pathway in endoderm development (Deol et al., 2017Deol G.S.J. Cuthbert T.N. Gatie M.I. Spice D.M. Hilton L.R. Kelly G.M. Wnt and Hedgehog signaling regulate the differentiation of F9 cells into extraembryonic endoderm.Front. Cell Dev. Biol. 2017; 5: 93Crossref PubMed Scopus (4) Google Scholar). We also observed an increase of HIF1A activity, which is a hypoxia-inducible factor. We could not find prior reports linking the observed JUN activity to the formation of endoderm lineage (Figure S3, right). Taken together, the analysis of the bulk RNA-seq data highlighted the importance of TGF-β, hypoxia, mTOR, and other pathways, as well as the TFs SMAD3 and -4 in the differentiation process. Cell differentiation is a complex, heterogeneous, and continuous process; its analysis at single-cell resolution could potentially reveal novel TFs guiding cell fate decisions, as well as new intermediate cell types. We therefore performed single-cell RNA-seq (scRNA-seq) of the PS branchpoint (Figure 2A ). We profiled at 36 and 72 h post-induction with AA and at 72 h post-induction with CHIR, which induces mesoderm (ME) differentiation. After data pre-processing and clustering, cells were annotated into different developmental stages (hESC, PS, ME, and DE) based on known gene markers whose expression correlated well with the sampled time points (Figure 2B; Table S1). Then, we identified differentially expressed TFs in DE, ME, and PS. We hypothesized that TFs inducing DE lineage would be differentially expressed in both PS (the branchpoint) and DE. Similarly, common differentially expressed TFs in PS and ME would induce ME lineage. The set of DE inducers included such known TFs like EOMES, GATA6, GSC, HHEX, LHX1, and OTX2, whereas FOXH1, MSX1, SP5, and TBXT were among the set of ME inducers (Table S4). Pseudotime analysis revealed that both groups of genes became strongly up-regulated through the course of differentiation at each respective stage, supporting the role of these TFs in regulating lineage specification (Figures 2C and 2D). Next, we identified “active” TFs during the course of differentiation using SCENIC (Aibar et al., 2017Aibar S. González-Blas C.B. Moerman T. Huynh-Thu V.A. Imrichova H. Hulselmans G. Rambow F. Marine J.C. Geurts P. Aerts J. et al.SCENIC: single-cell regulatory Network inference and clustering.Nat. Methods. 2017; 14: 1083-1086Crossref PubMed Scopus (1493) Google Scholar). Overall, SCENIC results were in agreement with those of the bulk RNA-seq analysis. Consistent with previous reports, SCENIC identified endoderm-related factors, including LHX1 and KLF6/8, OTX2 and FOXA2, SOX17, EOMES and ETS1/2, FOXQ1, and SIX3. It also revealed potentially new markers such as RREB1, CUX1, IRF9, DDIT3, TFAP2C, PBX3, JUN, and JUND (Figure 2E). Notably, RREB1 (Lee et al., 2012Lee D.H. Ko J.J. Ji Y.G. Chung H.M. Hwang T. Proteomic identification of RREB1, PDE6B, and CD209 up-regulated in primitive gut tube differentiated from human embryonic stem cells.Pancreas. 2012; 41: 65-73Crossref PubMed Scopus (8) Google Scholar) and CUX1 (Ripka et al., 2010Ripka S. Neesse A. Riedel J. Bug E. Aigner A. Poulsom R. Fulda S. Neoptolemos J. Greenhalf W. Barth P. et al.CUX1: target of akt signalling and mediator of resistance to apoptosis in pancreatic cancer.Gut. 2010; 59: 1101-1110Crossref PubMed Scopus (80) Google Scholar) are important for the development of the primitive gut tube and pancreas, while DDIT3 is a known apoptosis inducer (Papathanasiou et al., 1991Papathanasiou M.A. Kerr N.C. Robbins J.H. McBride O.W. Alamo Jr., I. Barrett S.F. Hickson I.D. Fornace Jr., A.J. Induction by ionizing radiation of the gadd45 gene in cultured human cells: lack of mediation by protein kinase C.Mol. Cell Biol. 1991; 11: 1009-1016Crossref PubMed Scopus (250) Google Scholar), and TFAP2C might be involved in mesoderm specification (Madrigal et al., 2020Madrigal P. Pauklin S. Goh K.J. Rodrigo G. Anna O. Ortmann D. Brown S. Vallier L. Epigenetic regulations follow cell cycle progression during differentiation of human pluripotent stem cells.bioRxiv. 2020; (Preprint at)https://doi.org/10.1101/2020.06.26.173211Crossref PubMed Scopus (0) Google Scholar). The identification of DDIT3, IRF, JUN, and TFAP2 was consistent with the previous bulk RNA-seq results (Figure 1B). However, the role of these genes in endoderm formation is not yet understood. For ME, we identified the following activated TFs: CDX1/2/4, which are involved in hematopoiesis (Paik et al., 2013Paik E.J. Mahony S. White R.M. Price E.N. Dibiase A. Dorjsuren B. Mosimann C. Davidson A.J. Gifford D. Zon L.I. A Cdx4-sall4 regulatory module controls the transition from mesoderm formation to embryonic hematopoiesis.Stem Cell Rep. 2013; 1: 425-436Abstract Full Text Full Text PDF PubMed Scopus (22) Google Scholar), TCF4, LEF1, SP5, HOXB, TBX6, and FOXH1, which is also involved in heart development (von Both et al., 2004von Both I. Silvestri C. Erdemir T. Lickert H. Walls J.R. Henkelman R.M. Rossant J. Harvey R.P. Attisano L. Wrana J.L. Foxh1 is essential for development of the anterior heart field.Dev. Cell. 2004; 7: 331-345Abstract Full Text Full Text PDF PubMed Scopus (170) Google Scholar). The role of other top ranked TFs in ME development, such as MAX, ZNF90, DBP, E2F1, and TFDP2, remains unknown (Figure 2E). SCENIC also assigned high activities to the pluripotency markers POU5F1 (OCT4), NANOG, and MYC in stem cells. SCENIC did not detect any enriched TFs in PS cells. Finally, we performed gene set enrichment analysis (GSEA) at the single-cell level, identifying high TGF-β and low WNT pathway activities in DE cells (Figure 2F). The activity of the mTORC1 Hallmark pathway was reduced in DE cells compared with hESCs and PS. We also observed down-regulation of the MYC pathway and activation of the hypoxia pathway in DE and ME compared with hESCs. GSEA of SMAD targets revealed a strong enrichment of SMAD2, -3, and -4 in PS and DE stages, corroborating their important role in DE differentiation. These results were consistent with those of the bulk RNA-seq data (Figure 1B). Taken together, scRNA-seq provided additional evidence for the importance of TGF-β/SMAD activation and mTOR/WNT inhibition in DE differentiation. It also uncovered TFs that could be important for lineage specification at the PS branchpoint. We performed a CMap analysis using ssCMap (Zhang and Gant, 2009Zhang S.-D. Gant T.W. sscMap: an extensible java application for connecting small-molecule drugs using gene-expression signatures.BMC Bioinf. 2009; 10: 236Crossref PubMed Scopus (63) Google Scholar) on 300 common up- and down-regulated genes from the bulk RNA-seq analysis (150 for each cohort; Figure 3A , right panel) that relies on the initial release of CMap. Similarity of ssCMap profiles to the DE differentiation profile was assessed by GSEA (Subramanian et al., 2005Subramanian A. Tamayo P. Mootha V.K. Mukherjee S. Ebert B.L. Gillette M.A. Paulovich A. Pomeroy S.L. Golub T.R. Lander E.S. et al.Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.Proc. Natl. Acad. Sci. USA. 2005; 102: 15545-15550Crossref PubMed Scopus (27802) Google Scholar). The analysis led to the identification of the PI3K inhibitors LY-294002 (LY29) and wortmannin, as well as the mTOR inhibitor sirolimus, whose expression profiles significantly correlated with the gene expression signature of DE (PC3 cell line) (Figure 3B; Table S5). The utility of this approach is highlighted by previous studies in which LY29 and wortmannin have been used along with AA to induce DE (Pan and Liu, 2019Pan G. Liu J. Small molecules and extrinsic factors promoting differentiation of stem cells into insulin-producing cells.Ann. Endocrinol. 2019; 80: 128-133Crossref PubMed Scopus (5) Google Scholar), and mTOR inhibition has been shown to improve PS induction (Zhou et al., 2009Zhou J. Su P. Wang L. Chen J. Zimmermann M. Genbacev O. Afonja O. Horne M.C. Tanaka T. Duan E. et al.mTOR supports long-term self-renewal and suppresses mesoderm and endoderm activities of human embryonic stem cells.Proc. Natl. Acad. Sci. USA. 2009; 106: 7840-7845Crossref PubMed Scopus (164) Google Scholar). Using the signatures of the same 300 common up-/down-regulated genes, we performed GSEA on the updated CMap/LINCS catalog. We set a threshold on the enrichment score to 0.35, resulting in the selection of 313 out of ∼200,000 conditions corresponding to 298 unique compounds (Figure 3C; Table S5). Some of the top identified hits were heat shock protein 90 (HSP90) and histone deacetylase (HDAC) inhibitors. HSP90 has been reported to inhibit mesoderm and endoderm fates (Bradley et al., 2012Bradley E. Bieberich E. Mivechi N.F. Tangpisuthipongsa D. Wang G. Regulation of embryonic stem cell pluripotency by heat shock protein 90.Stem Cell. 2012; 30: 1624-1633Crossref PubMed Scopus (46) Google Scholar), while HDAC inhibitors have been linked to endoderm lineage formation (Zhou et al., 2007Zhou Q.-J. Xiang L.-X. Shao J.-Z. Hu R.-Z. Lu Y.-L. Yao H. Dai L.-C. In vitro differentiation of hepatic progenitor cells from mouse embryonic stem cells induced by sodium butyrate.J. Cell. Biochem. 2007; 100: 29-42Crossref PubMed Scopus (51) Google Scholar). Together with ssCMAP, this analysis resulted in 299 (298 that include both sirolimus and wortmannin plus LY29) unique compounds. Later, we refer to this approach as “ssCMAP/LINCS” (Figures 3B and 3C). As an alternative to using a subset of genes for GSEA in drug reference profiles, we compared the differential expression metric of all genes (i.e., the T-statistic) with the LINCS perturbation Z scores using Pearson and Spearman correlations (Figure 3A, middle panel). Concretely, we looked at how compound profiles from LINCS are correlated with the full gene expression profiles from both public RNA-seq datasets (GEO: GSE75748 and GSE109658). This approach (henceforth called the “correlation” approach; Figures 3D and 3E) has the advantage that it avoids thresholding (no fixed size gene sets). Both correlation metrics agreed well within and between datasets (Figures 3D and 3E; Table S5), and results were consistent with the “ssCMAP/LINCS” approach: we observed the presence of PI3K and mTOR inhibitors, including LY29, wortmannin, and sirolimus, as well as HDAC and HSP90 inhibitors, such as geldanamycin and tacedinaline. Setting a threshold on the Pearson and Spearman correlation of 0.15, we identified 395 unique compounds. The overlap with the 299 “ssCMAP/LINCS” compounds was 37 unique molecules. During the bulk and scRNA-seq data analyses, we observed TFs/pathways that could potentially induce DE differentiation (Figures 1B and 3A, left panel), including SMAD2, -3, and -4, known to be essential for the activation of the TGF-β signaling pathway (Nakao et al., 1997Nakao A. Imamura T. Souchelnytskyi S. Kawabata M. Ishisaki A. Oeda E. Tamaki K. Hanai J. Heldin C.H. Miyazono K. et al.TGF-beta receptor-mediated signalling through Smad2, Smad3 and Smad4.EMBO J. 1997; 16: 5353-5362Crossref PubMed Scopus (916) Google Scholar). Instead of looking at individual genes, alternatively, we searched for compounds that could activate the pathways and respective TFs. Using GSEA, we identified a list of molecules from LINCS whose profiles were enriched for TGF-β pathways, as well as for genes regulated by SMADs (Figure 3F). This “pathway/TF” approach is based on comparing pathway/TF enrichment profiles instead of individual genes. Briefly, we focused on the cell line with the most abundant data in LINCS, MCF-7. Overall, 1,287 compounds were enriched for the TGF-β signaling pathway according to definitions from either Hallmark (Liberzon et al., 2015Liberzon A. Birger C. Thorvaldsdóttir H. Ghandi M. Mesirov J.P. Tamayo P. The molecular signatures database Hallmark gene set collection.Cell Syst. 2015; 1: 417-425https://doi.org/10.1016/j.cels.2015.12.004Abstract Full Text Full Text PDF PubMed Scopus (4288) Google Scholar) or KEGG pathways (Ogata et al., 1999Ogata H. Goto S. Sato K. Fujibuchi W. Bono H. Kanehisa M. KEGG: kyoto encyclopedia of genes and genomes.Nucleic Acids Res. 1999; 27: 29-34Crossref PubMed Scopus (3428) Google Scholar). Similarly, 1,062 compounds were enriched for SMAD targets according to the RegNetwork database (Liu et al., 2015bLiu Z.-P. Wu C. Miao H. Wu H. RegNetwork: an integrated database of transcriptional and post-transcriptional regulatory networks in human and mouse.Database. 2015; 2015: bav095https://doi.org/10.1093/database/bav095Crossref PubMed Scopus (243) Google Scholar). The entire set of results is provided (Table S6). A total of 408 unique molecules overlap between both datasets (Table S5). Taken together, we applied three drug repurposing strategies (Figure 3). The overall number of unique molecules was 999, with seven in common (Figure 3G; Table S5). For further experimental validation, we focused on the seven common compounds. We also purchased 10 molecules from “pathway/TF” for additional testing (which were selected based on the cost and availability). The reasons for the last purchase are detailed in the discussion. In order to test candidate compounds for their ability to induce DE formation, we designed an endoderm reporter using CRISPR, as previously described (Krentz et al., 2014Krentz N.A.J. Nian C. Lynn. F.C. TALEN/CRISPR-Mediated eGFP knock-in add-on at the OCT4 locus does not impact differentiation of human embryonic stem cells towards endoderm.PLoS One. 2014; 9: e114275Crossref PubMed Scopus (20) Google Scholar): a SOX17-mNeonGreen (SOX17-mNG) knockin H1 hESC line that preserved the endogenous SOX17 mRNA sequence (Figure 4A ). Wild-type H1 hESCs and SOX17-mNG cells were differentiated into DE cells for 3 days (Figures 4B–4D) using the standard protocol: 100 ng/mL AA + 3 μM CHIR for 1 day, followed by 100 ng/mL AA for 2 days (i.e., AC-A-A; Figure 4E). No changes in the number of CXCR4+ cells were observed by fluorescence-activated cell sorting (FACS) (Figure 5A ), suggesting that SOX17-mNG targeting did not impact endoderm differentiation.Figure 5Characterization of DE cells induced by small moleculesShow full caption(A) FACS data of DE cells differentiated from ESCs for 72 h by each treatment. DE markers

2 citations

Posted ContentDOI
03 Aug 2018-bioRxiv
TL;DR: The potential of OLSA to decompose the effects of a drug and identify its basic components is indicated, based on the factor including PI3K/AKT/mTORC1 inhibition activity, where 5 compounds were predicted to be novel autophagy inducers and other analysis including western blotting revealed that 4 of the 5 actually induced Autophagy.
Abstract: Drugs have multiple, not single, effects. Decomposition of drug effects into basic components helps us to understand the pharmacological properties of a drug and contributes to drug discovery. We have extended factor analysis and developed a novel profile data analysis method, orthogonal linear separation analysis (OLSA). OLSA contracted 11,911 genes to 118 factors from transcriptome data of MCF7 cells treated with 318 compounds in Connectivity Map. Ontology of the main genes constituting the factors detected significant enrichment of the ontology in 65 of 118 factors and similar results were obtained in two other data sets. One factor discriminated two Hsp90 inhibitors, geldanamycin and radicicol, while clustering analysis could not. Doxorubicin was estimated to inhibit Na+/K+ ATPase, one of the suggested mechanisms of doxorubicin-induced cardiotoxicity. Based on the factor including PI3K/AKT/mTORC1 inhibition activity, 5 compounds were predicted to be novel autophagy inducers, and other analysis including western blotting revealed that 4 of the 5 actually induced autophagy. These findings indicate the potential of OLSA to decompose the effects of a drug and identify its basic components.

2 citations


Cites methods from "A Next Generation Connectivity Map:..."

  • ...Of note, the Connectivity Map (CMap) project initiated by Broad Institute greatly contributed to the field (Lamb et al, 2006; Subramanian et al, 2017)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

34,830 citations

Journal Article
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Abstract: We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large datasets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of datasets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the datasets.

30,124 citations

Journal ArticleDOI
TL;DR: The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data and provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-power gene expression and genomic hybridization experiments.
Abstract: The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data. GEO provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-throughput gene expression and genomic hybridization experiments. GEO is not intended to replace in house gene expression databases that benefit from coherent data sets, and which are constructed to facilitate a particular analytic method, but rather complement these by acting as a tertiary, central data distribution hub. The three central data entities of GEO are platforms, samples and series, and were designed with gene expression and genomic hybridization experiments in mind. A platform is, essentially, a list of probes that define what set of molecules may be detected. A sample describes the set of molecules that are being probed and references a single platform used to generate its molecular abundance data. A series organizes samples into the meaningful data sets which make up an experiment. The GEO repository is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.

10,968 citations

Journal ArticleDOI
TL;DR: How BLAT was optimized is described, which is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences.
Abstract: Analyzing vertebrate genomes requires rapid mRNA/DNA and cross-species protein alignments A new tool, BLAT, is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences BLAT's speed stems from an index of all nonoverlapping K-mers in the genome This index fits inside the RAM of inexpensive computers, and need only be computed once for each genome assembly BLAT has several major stages It uses the index to find regions in the genome likely to be homologous to the query sequence It performs an alignment between homologous regions It stitches together these aligned regions (often exons) into larger alignments (typically genes) Finally, BLAT revisits small internal exons possibly missed at the first stage and adjusts large gap boundaries that have canonical splice sites where feasible This paper describes how BLAT was optimized Effects on speed and sensitivity are explored for various K-mer sizes, mismatch schemes, and number of required index matches BLAT is compared with other alignment programs on various test sets and then used in several genome-wide applications http://genomeucscedu hosts a web-based BLAT server for the human genome

8,326 citations

Journal ArticleDOI
TL;DR: This paper proposed parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples.
Abstract: SUMMARY Non-biological experimental variation or “batch effects” are commonly observed across multiple batches of microarray experiments, often rendering the task of combining data from these batches difficult. The ability to combine microarray data sets is advantageous to researchers to increase statistical power to detect biological phenomena from studies where logistical considerations restrict sample size or in studies that require the sequential hybridization of arrays. In general, it is inappropriate to combine data sets without adjusting for batch effects. Methods have been proposed to filter batch effects from data, but these are often complicated and require large batch sizes (>25) to implement. Because the majority of microarray studies are conducted using much smaller sample sizes, existing methods are not sufficient. We propose parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples. We illustrate our methods using two example data sets and show that our methods are justifiable, easy to apply, and useful in practice. Software for our method is freely available at: http://biosun1.harvard.edu/complab/batch/.

6,319 citations

Related Papers (5)