scispace - formally typeset
Search or ask a question

Showing papers by "Wellcome Trust Sanger Institute published in 2017"


Journal ArticleDOI
20 Jun 2017-JAMA
TL;DR: To estimate age-specific risks of breast, ovarian, and contralateral breast cancer for mutation carriers and to evaluate risk modification by family cancer history and mutation location, a large cohort study recruited in 1997-2011 provides estimates of cancer risk based on BRCA1 and BRCa2 mutation carrier status.
Abstract: Importance: The clinical management of BRCA1 and BRCA2 mutation carriers requires accurate, prospective cancer risk estimates. Objectives: To estimate age-specific risks of breast, ovarian, and contralateral breast cancer for mutation carriers and to evaluate risk modification by family cancer history and mutation location. Design, Setting, and Participants: Prospective cohort study of 6036 BRCA1 and 3820 BRCA2 female carriers (5046 unaffected and 4810 with breast or ovarian cancer or both at baseline) recruited in 1997-2011 through the International BRCA1/2 Carrier Cohort Study, the Breast Cancer Family Registry and the Kathleen Cuningham Foundation Consortium for Research into Familial Breast Cancer, with ascertainment through family clinics (94%) and population-based studies (6%). The majority were from large national studies in the United Kingdom (EMBRACE), the Netherlands (HEBON), and France (GENEPSO). Follow-up ended December 2013; median follow-up was 5 years. Exposures: BRCA1/2 mutations, family cancer history, and mutation location. Main Outcomes and Measures: Annual incidences, standardized incidence ratios, and cumulative risks of breast, ovarian, and contralateral breast cancer. Results: Among 3886 women (median age, 38 years; interquartile range [IQR], 30-46 years) eligible for the breast cancer analysis, 5066 women (median age, 38 years; IQR, 31-47 years) eligible for the ovarian cancer analysis, and 2213 women (median age, 47 years; IQR, 40-55 years) eligible for the contralateral breast cancer analysis, 426 were diagnosed with breast cancer, 109 with ovarian cancer, and 245 with contralateral breast cancer during follow-up. The cumulative breast cancer risk to age 80 years was 72% (95% CI, 65%-79%) for BRCA1 and 69% (95% CI, 61%-77%) for BRCA2 carriers. Breast cancer incidences increased rapidly in early adulthood until ages 30 to 40 years for BRCA1 and until ages 40 to 50 years for BRCA2 carriers, then remained at a similar, constant incidence (20-30 per 1000 person-years) until age 80 years. The cumulative ovarian cancer risk to age 80 years was 44% (95% CI, 36%-53%) for BRCA1 and 17% (95% CI, 11%-25%) for BRCA2 carriers. For contralateral breast cancer, the cumulative risk 20 years after breast cancer diagnosis was 40% (95% CI, 35%-45%) for BRCA1 and 26% (95% CI, 20%-33%) for BRCA2 carriers (hazard ratio [HR] for comparing BRCA2 vs BRCA1, 0.62; 95% CI, 0.47-0.82; P=.001 for difference). Breast cancer risk increased with increasing number of first- and second-degree relatives diagnosed as having breast cancer for both BRCA1 (HR for ≥2 vs 0 affected relatives, 1.99; 95% CI, 1.41-2.82; P<.001 for trend) and BRCA2 carriers (HR, 1.91; 95% CI, 1.08-3.37; P=.02 for trend). Breast cancer risk was higher if mutations were located outside vs within the regions bounded by positions c.2282-c.4071 in BRCA1 (HR, 1.46; 95% CI, 1.11-1.93; P=.007) and c.2831-c.6401 in BRCA2 (HR, 1.93; 95% CI, 1.36-2.74; P<.001). Conclusions and Relevance: These findings provide estimates of cancer risk based on BRCA1 and BRCA2 mutation carrier status using prospective data collection and demonstrate the potential importance of family history and mutation location in risk assessment.

1,733 citations


Journal ArticleDOI
TL;DR: COSMIC v78 contains wide resistance mutation profiles across 20 drugs, detailing the recurrence of 301 unique resistance alleles across 1934 drug-resistant tumours.
Abstract: COSMIC, the Catalogue of Somatic Mutations in Cancer (http://cancer.sanger.ac.uk) is a high-resolution resource for exploring targets and trends in the genetics of human cancer. Currently the broadest database of mutations in cancer, the information in COSMIC is curated by expert scientists, primarily by scrutinizing large numbers of scientific publications. Over 4 million coding mutations are described in v78 (September 2016), combining genome-wide sequencing results from 28 366 tumours with complete manual curation of 23 489 individual publications focused on 186 key genes and 286 key fusion pairs across all cancers. Molecular profiling of large tumour numbers has also allowed the annotation of more than 13 million non-coding mutations, 18 029 gene fusions, 187 429 genome rearrangements, 1 271 436 abnormal copy number segments, 9 175 462 abnormal expression variants and 7 879 142 differentially methylated CpG dinucleotides. COSMIC now details the genetics of drug resistance, novel somatic gene mutations which allow a tumour to evade therapeutic cancer drugs. Focusing initially on highly characterized drugs and genes, COSMIC v78 contains wide resistance mutation profiles across 20 drugs, detailing the recurrence of 301 unique resistance alleles across 1934 drug-resistant tumours. All information from the COSMIC database is available freely on the COSMIC website.

1,674 citations


Journal ArticleDOI
Aviv Regev1, Aviv Regev2, Aviv Regev3, Sarah A. Teichmann4, Sarah A. Teichmann5, Sarah A. Teichmann6, Eric S. Lander1, Eric S. Lander2, Eric S. Lander7, Ido Amit8, Christophe Benoist7, Ewan Birney5, Bernd Bodenmiller5, Bernd Bodenmiller9, Peter J. Campbell4, Peter J. Campbell6, Piero Carninci4, Menna R. Clatworthy10, Hans Clevers11, Bart Deplancke12, Ian Dunham5, James Eberwine13, Roland Eils14, Roland Eils15, Wolfgang Enard16, Andrew Farmer, Lars Fugger17, Berthold Göttgens4, Nir Hacohen2, Nir Hacohen7, Muzlifah Haniffa18, Martin Hemberg6, Seung K. Kim19, Paul Klenerman20, Paul Klenerman17, Arnold R. Kriegstein21, Ed S. Lein22, Sten Linnarsson23, Emma Lundberg24, Emma Lundberg19, Joakim Lundeberg24, Partha P. Majumder, John C. Marioni6, John C. Marioni5, John C. Marioni4, Miriam Merad25, Musa M. Mhlanga26, Martijn C. Nawijn27, Mihai G. Netea28, Garry P. Nolan19, Dana Pe'er29, Anthony Phillipakis2, Chris P. Ponting30, Stephen R. Quake19, Wolf Reik31, Wolf Reik4, Wolf Reik6, Orit Rozenblatt-Rosen2, Joshua R. Sanes7, Rahul Satija32, Ton N. Schumacher33, Alex K. Shalek34, Alex K. Shalek2, Alex K. Shalek1, Ehud Shapiro8, Padmanee Sharma35, Jay W. Shin, Oliver Stegle5, Michael R. Stratton6, Michael J. T. Stubbington6, Fabian J. Theis36, Matthias Uhlen37, Matthias Uhlen24, Alexander van Oudenaarden11, Allon Wagner38, Fiona M. Watt39, Jonathan S. Weissman, Barbara J. Wold40, Ramnik J. Xavier, Nir Yosef34, Nir Yosef38, Human Cell Atlas Meeting Participants 
05 Dec 2017-eLife
TL;DR: An open comprehensive reference map of the molecular state of cells in healthy human tissues would propel the systematic study of physiological states, developmental trajectories, regulatory circuitry and interactions of cells, and also provide a framework for understanding cellular dysregulation in human disease.
Abstract: The recent advent of methods for high-throughput single-cell molecular profiling has catalyzed a growing sense in the scientific community that the time is ripe to complete the 150-year-old effort to identify all cell types in the human body. The Human Cell Atlas Project is an international collaborative effort that aims to define all human cell types in terms of distinctive molecular profiles (such as gene expression profiles) and to connect this information with classical cellular descriptions (such as location and morphology). An open comprehensive reference map of the molecular state of cells in healthy human tissues would propel the systematic study of physiological states, developmental trajectories, regulatory circuitry and interactions of cells, and also provide a framework for understanding cellular dysregulation in human disease. Here we describe the idea, its potential utility, early proofs-of-concept, and some design considerations for the Human Cell Atlas, including a commitment to open data, code, and community.

1,391 citations


Journal ArticleDOI
TL;DR: It is demonstrated that SC3 is capable of identifying subclones from the transcriptomes of neoplastic cells collected from patients and achieves high accuracy and robustness by combining multiple clustering solutions through a consensus approach.
Abstract: Single-cell RNA-seq enables the quantitative characterization of cell types based on global transcriptome profiles. We present single-cell consensus clustering (SC3), a user-friendly tool for unsupervised clustering, which achieves high accuracy and robustness by combining multiple clustering solutions through a consensus approach (http://bioconductor.org/packages/SC3). We demonstrate that SC3 is capable of identifying subclones from the transcriptomes of neoplastic cells collected from patients.

1,120 citations


Journal ArticleDOI
TL;DR: The use of eDNA metabarcoding for surveying animal and plant richness, and the challenges in using eDNA approaches to estimate relative abundance are reviewed, which distill what is known about the ability of different eDNA sample types to approximate richness in space and across time.
Abstract: The genomic revolution has fundamentally changed how we survey biodiversity on earth. High-throughput sequencing ("HTS") platforms now enable the rapid sequencing of DNA from diverse kinds of environmental samples (termed "environmental DNA" or "eDNA"). Coupling HTS with our ability to associate sequences from eDNA with a taxonomic name is called "eDNA metabarcoding" and offers a powerful molecular tool capable of noninvasively surveying species richness from many ecosystems. Here, we review the use of eDNA metabarcoding for surveying animal and plant richness, and the challenges in using eDNA approaches to estimate relative abundance. We highlight eDNA applications in freshwater, marine and terrestrial environments, and in this broad context, we distill what is known about the ability of different eDNA sample types to approximate richness in space and across time. We provide guiding questions for study design and discuss the eDNA metabarcoding workflow with a focus on primers and library preparation methods. We additionally discuss important criteria for consideration of bioinformatic filtering of data sets, with recommendations for increasing transparency. Finally, looking to the future, we discuss emerging applications of eDNA metabarcoding in ecology, conservation, invasion biology, biomonitoring, and how eDNA metabarcoding can empower citizen science and biodiversity education.

1,038 citations


Journal ArticleDOI
16 Nov 2017-Cell
TL;DR: This work adapted methods from molecular evolution and applied them to 7,664 tumors across 29 cancer types, allowing exome-wide enumeration of all driver coding mutations, including outside known cancer genes.

938 citations


Journal ArticleDOI
TL;DR: The wide-ranging biomedical utilities of PLC-derived organoid models in furthering the understanding of liver cancer biology and in developing personalized-medicine approaches for the disease are demonstrated.
Abstract: Human liver cancer research currently lacks in vitro models that can faithfully recapitulate the pathophysiology of the original tumor. We recently described a novel, near-physiological organoid culture system, wherein primary human healthy liver cells form long-term expanding organoids that retain liver tissue function and genetic stability. Here we extend this culture system to the propagation of primary liver cancer (PLC) organoids from three of the most common PLC subtypes: hepatocellular carcinoma (HCC), cholangiocarcinoma (CC) and combined HCC/CC (CHC) tumors. PLC-derived organoid cultures preserve the histological architecture, gene expression and genomic landscape of the original tumor, allowing for discrimination between different tumor tissues and subtypes, even after long-term expansion in culture in the same medium conditions. Xenograft studies demonstrate that the tumorogenic potential, histological features and metastatic properties of PLC-derived organoids are preserved in vivo. PLC-derived organoids are amenable for biomarker identification and drug-screening testing and led to the identification of the ERK inhibitor SCH772984 as a potential therapeutic agent for primary liver cancer. We thus demonstrate the wide-ranging biomedical utilities of PLC-derived organoid models in furthering the understanding of liver cancer biology and in developing personalized-medicine approaches for the disease.

831 citations


Journal ArticleDOI
TL;DR: This work identified 25 new susceptibility loci, 3 of which contain integrin genes that encode proteins in pathways that have been identified as important therapeutic targets in inflammatory bowel disease and identified 3 associated variants that are correlated with expression changes in response to immune stimulus at two of these genes.
Abstract: Genetic association studies have identified 215 risk loci for inflammatory bowel disease, thereby uncovering fundamental aspects of its molecular biology. We performed a genome-wide association study of 25,305 individuals and conducted a meta-analysis with published summary statistics, yielding a total sample size of 59,957 subjects. We identified 25 new susceptibility loci, 3 of which contain integrin genes that encode proteins in pathways that have been identified as important therapeutic targets in inflammatory bowel disease. The associated variants are correlated with expression changes in response to immune stimulus at two of these genes (ITGA4 and ITGB8) and at previously implicated loci (ITGAL and ICAM1). In all four cases, the expression-increasing allele also increases disease risk. We also identified likely causal missense variants in a gene implicated in primary immune deficiency, PLCG2, and a negative regulator of inflammation, SLAMF8. Our results demonstrate that new associations at common variants continue to identify genes relevant to therapeutic target identification and prioritization.

813 citations


Posted ContentDOI
12 Jul 2017-bioRxiv
TL;DR: The integrative analysis of more than 2,600 whole cancer genomes and their matching normal tissues across 39 distinct tumour types represents the most comprehensive look at cancer whole genomes to date.
Abstract: We report the integrative analysis of more than 2,600 whole cancer genomes and their matching normal tissues across 39 distinct tumour types. By studying whole genomes we have been able to catalogue non-coding cancer driver events, study patterns of structural variation, infer tumour evolution, probe the interactions among variants in the germline genome, the tumour genome and the transcriptome, and derive an understanding of how coding and non-coding variations together contribute to driving individual patient9s tumours. This work represents the most comprehensive look at cancer whole genomes to date. NOTE TO READERS: This is an incomplete draft of the marker paper for the Pan-Cancer Analysis of Whole Genomes Project, and is intended to provide the background information for a series of in-depth papers that will be posted to BioRixv during the summer of 2017.

735 citations


Journal ArticleDOI
TL;DR: In this article, a weighted model called HRDetect was developed to accurately detect BRCA1/BRCA2-deficient samples with 98.7% sensitivity (area under the curve (AUC) = 0.98).
Abstract: Approximately 1-5% of breast cancers are attributed to inherited mutations in BRCA1 or BRCA2 and are selectively sensitive to poly(ADP-ribose) polymerase (PARP) inhibitors. In other cancer types, germline and/or somatic mutations in BRCA1 and/or BRCA2 (BRCA1/BRCA2) also confer selective sensitivity to PARP inhibitors. Thus, assays to detect BRCA1/BRCA2-deficient tumors have been sought. Recently, somatic substitution, insertion/deletion and rearrangement patterns, or 'mutational signatures', were associated with BRCA1/BRCA2 dysfunction. Herein we used a lasso logistic regression model to identify six distinguishing mutational signatures predictive of BRCA1/BRCA2 deficiency. A weighted model called HRDetect was developed to accurately detect BRCA1/BRCA2-deficient samples. HRDetect identifies BRCA1/BRCA2-deficient tumors with 98.7% sensitivity (area under the curve (AUC) = 0.98). Application of this model in a cohort of 560 individuals with breast cancer, of whom 22 were known to carry a germline BRCA1 or BRCA2 mutation, allowed us to identify an additional 22 tumors with somatic loss of BRCA1 or BRCA2 and 47 tumors with functional BRCA1/BRCA2 deficiency where no mutation was detected. We validated HRDetect on independent cohorts of breast, ovarian and pancreatic cancers and demonstrated its efficacy in alternative sequencing strategies. Integrating all of the classes of mutational signatures thus reveals a larger proportion of individuals with breast cancer harboring BRCA1/BRCA2 deficiency (up to 22%) than hitherto appreciated (∼1-5%) who could have selective therapeutic sensitivity to PARP inhibition.

710 citations


Journal ArticleDOI
07 Dec 2017-Nature
TL;DR: Together, these data define METTL3 as a regulator of a chromatin-based pathway that is necessary for maintenance of the leukaemic state and identify this enzyme as a potential therapeutic target for acute myeloid leukaemia.
Abstract: N6-methyladenosine (m6A) is an abundant internal RNA modification in both coding and non-coding RNAs that is catalysed by the METTL3-METTL14 methyltransferase complex. However, the specific role of these enzymes in cancer is still largely unknown. Here we define a pathway that is specific for METTL3 and is implicated in the maintenance of a leukaemic state. We identify METTL3 as an essential gene for growth of acute myeloid leukaemia cells in two distinct genetic screens. Downregulation of METTL3 results in cell cycle arrest, differentiation of leukaemic cells and failure to establish leukaemia in immunodeficient mice. We show that METTL3, independently of METTL14, associates with chromatin and localizes to the transcriptional start sites of active genes. The vast majority of these genes have the CAATT-box binding protein CEBPZ present at the transcriptional start site, and this is required for recruitment of METTL3 to chromatin. Promoter-bound METTL3 induces m6A modification within the coding region of the associated mRNA transcript, and enhances its translation by relieving ribosome stalling. We show that genes regulated by METTL3 in this way are necessary for acute myeloid leukaemia. Together, these data define METTL3 as a regulator of a chromatin-based pathway that is necessary for maintenance of the leukaemic state and identify this enzyme as a potential therapeutic target for acute myeloid leukaemia.

Journal ArticleDOI
Simone Wahl, Alexander W. Drong1, Benjamin Lehne2, Marie Loh2, Marie Loh3, Marie Loh4, William R. Scott2, William R. Scott5, Sonja Kunze, Pei-Chien Tsai6, Janina S. Ried, Weihua Zhang7, Weihua Zhang2, Youwen Yang2, Sili Tan8, Giovanni Fiorito9, Lude Franke10, Simonetta Guarrera9, Silva Kasela11, Jennifer Kriebel, Rebecca C Richmond12, Marco Adamo13, Uzma Afzal2, Uzma Afzal7, Mika Ala-Korpela14, Mika Ala-Korpela3, Mika Ala-Korpela12, Benedetta Albetti15, Ole Ammerpohl16, Jane F. Apperley2, Marian Beekman17, Pier Alberto Bertazzi15, S. Lucas Black2, Christine Blancher1, Marc Jan Bonder10, Mario Brosch18, Maren Carstensen-Kirberg19, Anton J. M. de Craen17, Simon de Lusignan20, Abbas Dehghan21, Mohamed Elkalaawy13, Krista Fischer11, Oscar H. Franco21, Tom R. Gaunt12, Jochen Hampe18, Majid Hashemi13, Aaron Isaacs21, Andrew Jenkinson13, Sujeet Jha22, Norihiro Kato, Vittorio Krogh, Michael Laffan2, Christa Meisinger, Thomas Meitinger23, Zuan Yu Mok8, Valeria Motta15, Hong Kiat Ng8, Zacharoula Nikolakopoulou5, Georgios Nteliopoulos2, Salvatore Panico24, Natalia Pervjakova11, Holger Prokisch23, Wolfgang Rathmann19, Michael Roden19, Federica Rota15, Michelle Ann Rozario8, Johanna K. Sandling25, Johanna K. Sandling26, Clemens Schafmayer, Katharina Schramm23, Reiner Siebert27, Reiner Siebert16, P. Eline Slagboom17, Pasi Soininen14, Pasi Soininen3, Lisette Stolk21, Konstantin Strauch28, E-Shyong Tai8, Letizia Tarantini15, Barbara Thorand, Ettje F. Tigchelaar10, Rosario Tumino, André G. Uitterlinden21, Cornelia M. van Duijn21, Joyce B. J. van Meurs21, Paolo Vineis, Ananda R. Wickremasinghe29, Cisca Wijmenga10, Tsun-Po Yang26, Wei Yuan6, Wei Yuan30, Alexandra Zhernakova10, Rachel L. Batterham13, George Davey Smith12, Panos Deloukas31, Panos Deloukas26, Panos Deloukas32, Bastiaan T. Heijmans17, Christian Herder19, Albert Hofman21, Cecilia M. Lindgren1, Cecilia M. Lindgren33, Lili Milani11, Pim van der Harst10, Annette Peters, Thomas Illig, Caroline L Relton12, Melanie Waldenberger, Marjo-Riitta Järvelin34, Valentina Bollati15, Richie Soong8, Tim D. Spector6, James Scott5, Mark I. McCarthy35, Mark I. McCarthy36, Mark I. McCarthy1, Paul Elliott37, Paul Elliott2, Jordana T. Bell6, Giuseppe Matullo9, Christian Gieger, Jaspal S. Kooner5, Harald Grallert, John C. Chambers 
05 Jan 2017-Nature
TL;DR: In this article, the authors used epigenome-wide association to show that body mass index (BMI), a key measure of adiposity, is associated with widespread changes in DNA methylation.
Abstract: Approximately 1.5 billion people worldwide are overweight or affected by obesity, and are at risk of developing type 2 diabetes, cardiovascular disease and related metabolic and inflammatory disturbances1,2. Although the mechanisms linking adiposity to associated clinical conditions are poorly understood, recent studies suggest that adiposity may influence DNA methylation3,4,5,6, a key regulator of gene expression and molecular phenotype7. Here we use epigenome-wide association to show that body mass index (BMI; a key measure of adiposity) is associated with widespread changes in DNA methylation (187 genetic loci with P < 1 × 10−7, range P = 9.2 × 10−8 to 6.0 × 10−46; n = 10,261 samples). Genetic association analyses demonstrate that the alterations in DNA methylation are predominantly the consequence of adiposity, rather than the cause. We find that methylation loci are enriched for functional genomic features in multiple tissues (P < 0.05), and show that sentinel methylation markers identify gene expression signatures at 38 loci (P < 9.0 × 10−6, range P = 5.5 × 10−6 to 6.1 × 10−35, n = 1,785 samples). The methylation loci identify genes involved in lipid and lipoprotein metabolism, substrate transport and inflammatory pathways. Finally, we show that the disturbances in DNA methylation predict future development of type 2 diabetes (relative risk per 1 standard deviation increase in methylation risk score: 2.3 (2.07–2.56); P = 1.1 × 10−54). Our results provide new insights into the biologic pathways influenced by adiposity, and may enable development of new strategies for prediction and prevention of type 2 diabetes and other adverse clinical consequences of obesity.

Journal ArticleDOI
TL;DR: It is asserted that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote the understanding of human biology and advance the efforts to improve health.
Abstract: The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.

Journal ArticleDOI
TL;DR: The progress of the HPO project is reviewed, including specific areas of expansion such as common (complex) disease, new algorithms for phenotype driven genomic discovery and diagnostics, integration of cross-species mapping efforts with the Mammalian Phenotype Ontology, an improved quality control pipeline, and the addition of patient-friendly terminology.
Abstract: Deep phenotyping has been defined as the precise and comprehensive analysis of phenotypic abnormalities in which the individual components of the phenotype are observed and described. The three components of the Human Phenotype Ontology (HPO; www.human-phenotype-ontology.org) project are the phenotype vocabulary, disease-phenotype annotations and the algorithms that operate on these. These components are being used for computational deep phenotyping and precision medicine as well as integration of clinical data into translational research. The HPO is being increasingly adopted as a standard for phenotypic abnormalities by diverse groups such as international rare disease organizations, registries, clinical labs, biomedical resources, and clinical software tools and will thereby contribute toward nascent efforts at global data exchange for identifying disease etiologies. This update article reviews the progress of the HPO project since the debut Nucleic Acids Research database article in 2014, including specific areas of expansion such as common (complex) disease, new algorithms for phenotype driven genomic discovery and diagnostics, integration of cross-species mapping efforts with the Mammalian Phenotype Ontology, an improved quality control pipeline, and the addition of patient-friendly terminology.

Journal ArticleDOI
TL;DR: A practical guide to help researchers design their first scRNA-seq studies, including introductory information on experimental hardware, protocol choice, quality control, data analysis and biological interpretation is presented.
Abstract: RNA sequencing (RNA-seq) is a genomic approach for the detection and quantitative analysis of messenger RNA molecules in a biological sample and is useful for studying cellular responses. RNA-seq has fueled much discovery and innovation in medicine over recent years. For practical reasons, the technique is usually conducted on samples comprising thousands to millions of cells. However, this has hindered direct assessment of the fundamental unit of biology—the cell. Since the first single-cell RNA-sequencing (scRNA-seq) study was published in 2009, many more have been conducted, mostly by specialist laboratories with unique skills in wet-lab single-cell genomics, bioinformatics, and computation. However, with the increasing commercial availability of scRNA-seq platforms, and the rapid ongoing maturation of bioinformatics approaches, a point has been reached where any biomedical researcher or clinician can use scRNA-seq to make exciting discoveries. In this review, we present a practical guide to help researchers design their first scRNA-seq studies, including introductory information on experimental hardware, protocol choice, quality control, data analysis and biological interpretation.

Journal ArticleDOI
Robert A. Scott1, Laura J. Scott2, Reedik Mägi3, Letizia Marullo4  +213 moreInstitutions (66)
01 Nov 2017-Diabetes
TL;DR: This article conducted a meta-analysis of genome-wide association data from 26,676 T2D case and 132,532 control subjects of European ancestry after imputation using the 1000 Genomes multiethnic reference panel.
Abstract: To characterize type 2 diabetes (T2D)-associated variation across the allele frequency spectrum, we conducted a meta-analysis of genome-wide association data from 26,676 T2D case and 132,532 control subjects of European ancestry after imputation using the 1000 Genomes multiethnic reference panel Promising association signals were followed up in additional data sets (of 14,545 or 7,397 T2D case and 38,994 or 71,604 control subjects) We identified 13 novel T2D-associated loci (P < 5 × 10-8), including variants near the GLP2R, GIP, and HLA-DQA1 genes Our analysis brought the total number of independent T2D associations to 128 distinct signals at 113 loci Despite substantially increased sample size and more complete coverage of low-frequency variation, all novel associations were driven by common single nucleotide variants Credible sets of potentially causal variants were generally larger than those based on imputation with earlier reference panels, consistent with resolution of causal signals to common risk haplotypes Stratification of T2D-associated loci based on T2D-related quantitative trait associations revealed tissue-specific enrichment of regulatory annotations in pancreatic islet enhancers for loci influencing insulin secretion and in adipocytes, monocytes, and hepatocytes for insulin action-associated loci These findings highlight the predominant role played by common variants of modest effect and the diversity of biological mechanisms influencing T2D pathophysiology

Journal ArticleDOI
Christopher P. Nelson1, Christopher P. Nelson2, Anuj Goel3, Anuj Goel4, Adam S. Butterworth5, Stavroula Kanoni6, Tom R. Webb1, Tom R. Webb2, Eirini Marouli6, Lingyao Zeng7, Ioanna Ntalla6, Florence Lai1, Florence Lai2, Jemma C. Hopewell3, Olga Giannakopoulou6, Tao Jiang5, Stephen E. Hamby1, Stephen E. Hamby2, Emanuele Di Angelantonio5, Themistocles L. Assimes8, Erwin P. Bottinger9, John C. Chambers10, John C. Chambers11, John C. Chambers12, Robert Clarke3, Colin N. A. Palmer13, Richard M Cubbon14, Patrick T. Ellinor15, Raili Ermel16, Evangelos Evangelou12, Evangelos Evangelou17, Paul W. Franks18, Paul W. Franks19, Paul W. Franks20, Christopher Grace3, Christopher Grace4, Dongfeng Gu21, Aroon D. Hingorani22, Joanna M. M. Howson5, Erik Ingelsson8, Adnan Kastrati7, Thorsten Kessler7, Theodosios Kyriakou3, Theodosios Kyriakou4, Terho Lehtimäki23, Xiangfeng Lu8, Yingchang Lu9, Yingchang Lu24, Winfried März25, Winfried März26, Winfried März27, Ruth McPherson28, Andres Metspalu29, Mar Pujades-Rodriguez14, Arno Ruusalepp16, Eric E. Schadt9, Amand F. Schmidt22, Michael J. Sweeting5, Pierre Zalloua20, Pierre Zalloua30, Kamal Alghalayini31, Bernard Keavney32, Bernard Keavney33, Jaspal S. Kooner34, Jaspal S. Kooner11, Jaspal S. Kooner10, Ruth J. F. Loos9, Riyaz S. Patel35, Martin K. Rutter33, Martin K. Rutter32, Maciej Tomaszewski32, Maciej Tomaszewski36, Ioanna Tzoulaki12, Ioanna Tzoulaki17, Eleftheria Zeggini37, Jeanette Erdmann38, George Dedoussis39, Johan L.M. Björkegren40, Johan L.M. Björkegren9, CARDIoGRAMplusC D3, Heribert Schunkert7, Martin Farrall4, Martin Farrall3, John Danesh37, John Danesh5, Nilesh J. Samani2, Nilesh J. Samani1, Hugh Watkins3, Hugh Watkins4, Panos Deloukas31, Panos Deloukas6 
TL;DR: This approach identified 13 new loci at genome-wide significance, 12 of which were on the previous list of loci meeting the 5% FDR threshold, thus providing strong support that the remaining loci identified by FDR represent genuine signals.
Abstract: Genome-wide association studies (GWAS) in coronary artery disease (CAD) had identified 66 loci at 'genome-wide significance' (P < 5 × 10-8) at the time of this analysis, but a much larger number of putative loci at a false discovery rate (FDR) of 5% (refs. 1,2,3,4). Here we leverage an interim release of UK Biobank (UKBB) data to evaluate the validity of the FDR approach. We tested a CAD phenotype inclusive of angina (SOFT; ncases = 10,801) as well as a stricter definition without angina (HARD; ncases = 6,482) and selected cases with the former phenotype to conduct a meta-analysis using the two most recent CAD GWAS. This approach identified 13 new loci at genome-wide significance, 12 of which were on our previous list of loci meeting the 5% FDR threshold, thus providing strong support that the remaining loci identified by FDR represent genuine signals. The 304 independent variants associated at 5% FDR in this study explain 21.2% of CAD heritability and identify 243 loci that implicate pathways in blood vessel morphogenesis as well as lipid metabolism, nitric oxide signaling and inflammation.

Journal ArticleDOI
TL;DR: A mesenchymal sub-population with stem cell-like characteristics that gives rise to both lineages and, at the same time, acts as a principal component of the hematopoietic niche by promoting competitive repopulation following lethal irradiation is described.

Journal ArticleDOI
22 May 2017-Nature
TL;DR: It is found that the genomic architecture of flowering time has been shaped by the most recent whole-genome duplication, which suggests that ancient paralogues can remain in the same regulatory networks for dozens of millions of years.
Abstract: The domesticated sunflower, Helianthus annuus L, is a global oil crop that has promise for climate change adaptation, because it can maintain stable yields across a wide variety of environmental conditions, including drought Even greater resilience is achievable through the mining of resistance alleles from compatible wild sunflower relatives, including numerous extremophile species Here we report a high-quality reference for the sunflower genome (36 gigabases), together with extensive transcriptomic data from vegetative and floral organs The genome mostly consists of highly similar, related sequences and required single-molecule real-time sequencing technologies for successful assembly Genome analyses enabled the reconstruction of the evolutionary history of the Asterids, further establishing the existence of a whole-genome triplication at the base of the Asterids II clade and a sunflower-specific whole-genome duplication around 29 million years ago An integrative approach combining quantitative genetics, expression and diversity data permitted development of comprehensive gene networks for two major breeding traits, flowering time and oil metabolism, and revealed new candidate genes in these networks We found that the genomic architecture of flowering time has been shaped by the most recent whole-genome duplication, which suggests that ancient paralogues can remain in the same regulatory networks for dozens of millions of years This genome represents a cornerstone for future research programs aiming to exploit genetic diversity to improve biotic and abiotic stress resistance and oil production, while also considering agricultural constraints and human nutritional needs

Journal ArticleDOI
26 Oct 2017-Nature
TL;DR: In this paper, Regev et al. outline some key challenges for the project to map all the cells in the human body and propose a method to solve the challenges of the task.
Abstract: As an ambitious project to map all the cells in the human body gets officially under way, Aviv Regev, Sarah Teichmann and colleagues outline some key challenges.

Journal ArticleDOI
TL;DR: In this article, using single-cell RNA sequencing, the authors determined the transcriptome of more than 1,600 individual microglia cells isolated from the hippocampus of a mouse model of severe neurodegeneration with AD-like phenotypes and of control mice at multiple time points during progression of the disease.

Journal ArticleDOI
TL;DR: This analysis provides an integrated framework for comparing scRNA-seq protocols and compared 15 protocols computationally and 4 protocols experimentally for batch-matched cell populations, in addition to investigating the effects of spike-in molecular degradation.
Abstract: Single-cell RNA sequencing (scRNA-seq) has become an established and powerful method to investigate transcriptomic cell-to-cell variation, thereby revealing new cell types and providing insights into developmental processes and transcriptional stochasticity. A key question is how the variety of available protocols compare in terms of their ability to detect and accurately quantify gene expression. Here, we assessed the protocol sensitivity and accuracy of many published data sets, on the basis of spike-in standards and uniform data processing. For our workflow, we developed a flexible tool for counting the number of unique molecular identifiers (https://github.com/vals/umis/). We compared 15 protocols computationally and 4 protocols experimentally for batch-matched cell populations, in addition to investigating the effects of spike-in molecular degradation. Our analysis provides an integrated framework for comparing scRNA-seq protocols.

Journal ArticleDOI
TL;DR: Several lines of analysis indicate that clones seeding metastasis or relapse disseminate late from primary tumors, but continue to acquire mutations, mostly accessing the same mutational processes active in the primary tumor.

Journal ArticleDOI
Dajiang J. Liu1, Gina M. Peloso2, Gina M. Peloso3, Haojie Yu4  +285 moreInstitutions (91)
TL;DR: It is found that beta-thalassemia trait carriers displayed lower TC and were protected from coronary artery disease (CAD), and only some mechanisms of lowering LDL-C appeared to increase risk for type 2 diabetes (T2D); and TG-lowering alleles involved in hepatic production of TG-rich lipoproteins tracked with higher liver fat, higher risk for T2D, and lower risk for CAD.
Abstract: We screened variants on an exome-focused genotyping array in >300,000 participants (replication in >280,000 participants) and identified 444 independent variants in 250 loci significantly associated with total cholesterol (TC), high-density-lipoprotein cholesterol (HDL-C), low-density-lipoprotein cholesterol (LDL-C), and/or triglycerides (TG). At two loci (JAK2 and A1CF), experimental analysis in mice showed lipid changes consistent with the human data. We also found that: (i) beta-thalassemia trait carriers displayed lower TC and were protected from coronary artery disease (CAD); (ii) excluding the CETP locus, there was not a predictable relationship between plasma HDL-C and risk for age-related macular degeneration; (iii) only some mechanisms of lowering LDL-C appeared to increase risk for type 2 diabetes (T2D); and (iv) TG-lowering alleles involved in hepatic production of TG-rich lipoproteins (TM6SF2 and PNPLA3) tracked with higher liver fat, higher risk for T2D, and lower risk for CAD, whereas TG-lowering alleles involved in peripheral lipolysis (LPL and ANGPTL4) had no effect on liver fat but decreased risks for both T2D and CAD.

Journal ArticleDOI
15 Jun 2017-Nature
TL;DR: This study outlines the major sources of genetic and phenotypic variation in iPS cells and establishes their suitability as models of complex human traits and cancer.
Abstract: Technology utilizing human induced pluripotent stem cells (iPS cells) has enormous potential to provide improved cellular models of human disease. However, variable genetic and phenotypic characterization of many existing iPS cell lines limits their potential use for research and therapy. Here we describe the systematic generation, genotyping and phenotyping of 711 iPS cell lines derived from 301 healthy individuals by the Human Induced Pluripotent Stem Cells Initiative. Our study outlines the major sources of genetic and phenotypic variation in iPS cells and establishes their suitability as models of complex human traits and cancer. Through genome-wide profiling we find that 5-46% of the variation in different iPS cell phenotypes, including differentiation capacity and cellular morphology, arises from differences between individuals. Additionally, we assess the phenotypic consequences of genomic copy-number alterations that are repeatedly observed in iPS cells. In addition, we present a comprehensive map of common regulatory variants affecting the transcriptome of human pluripotent cells.

Journal ArticleDOI
Richard Anney1, Richard Anney2, Stephan Ripke3, Stephan Ripke4  +211 moreInstitutions (77)
TL;DR: A significant genetic correlation with schizophrenia and association of ASD with several neurodevelopmental-related genes such as EXT1, ASTN2, MACROD2, and HDAC4 is identified and identified.
Abstract: Background: Over the past decade genome-wide association studies (GWAS) have been applied to aid in the understanding of the biology of traits. The success of this approach is governed by the underlying effect sizes carried by the true risk variants and the corresponding statistical power to observe such effects given the study design and sample size under investigation. Previous ASD GWAS have identified genome-wide significant (GWS) risk loci; however, these studies were of only of low statistical power to identify GWS loci at the lower effect sizes (odds ratio (OR) <1.15). Methods: We conducted a large-scale coordinated international collaboration to combine independent genotyping data to improve the statistical power and aid in robust discovery of GWS loci. This study uses genome-wide genotyping data from a discovery sample (7387 ASD cases and 8567 controls) followed by meta-analysis of summary statistics from two replication sets (7783 ASD cases and 11359 controls; and 1369 ASD cases and 137308 controls). Results: We observe a GWS locus at 10q24.32 that overlaps several genes including PITX3, which encodes a transcription factor identified as playing a role in neuronal differentiation and CUEDC2 previously reported to be associated with social skills in an independent population cohort. We also observe overlap with regions previously implicated in schizophrenia which was further supported by a strong genetic correlation between these disorders (Rg = 0.23; P=9 ×10−6). We further combined these Psychiatric Genomics Consortium (PGC) ASD GWAS data with the recent PGC schizophrenia GWAS to identify additional regions which may be important in a common neurodevelopmental phenotype and identified 12 novel GWS loci. These include loci previously implicated in ASD such as FOXP1 at 3p13, ATP2B2 at 3p25.3, and a ‘neurodevelopmental hub’ on chromosome 8p11.23. Conclusions: This study is an important step in the ongoing endeavour to identify the loci which underpin the common variant signal in ASD. In addition to novel GWS loci, we have identified a significant genetic correlation with schizophrenia and association of ASD with several neurodevelopmental-related genes such as EXT1, ASTN2, MACROD2, and HDAC4.

Journal ArticleDOI
TL;DR: WormBase ParaSite (http://parasite.wormbase.org) as mentioned in this paper is a portal for the analysis of helminth genomic data, including worms and platy helminths.

Journal ArticleDOI
13 Jul 2017-Nature
TL;DR: The results of this study suggest that high-resolution fine-mapping in large samples can convert many discoveries from genome-wide association studies into statistically convincing causal variants, providing a powerful substrate for experimental elucidation of disease mechanisms.
Abstract: Inflammatory bowel diseases are chronic gastrointestinal inflammatory disorders that affect millions of people worldwide. Genome-wide association studies have identified 200 inflammatory bowel disease-associated loci, but few have been conclusively resolved to specific functional variants. Here we report fine-mapping of 94 inflammatory bowel disease loci using high-density genotyping in 67,852 individuals. We pinpoint 18 associations to a single causal variant with greater than 95% certainty, and an additional 27 associations to a single variant with greater than 50% certainty. These 45 variants are significantly enriched for protein-coding changes (n = 13), direct disruption of transcription-factor binding sites (n = 3), and tissue-specific epigenetic marks (n = 10), with the last category showing enrichment in specific immune cells among associations stronger in Crohn's disease and in gut mucosa among associations stronger in ulcerative colitis. The results of this study suggest that high-resolution fine-mapping in large samples can convert many discoveries from genome-wide association studies into statistically convincing causal variants, providing a powerful substrate for experimental elucidation of disease mechanisms.

Journal ArticleDOI
04 Sep 2017
TL;DR: A new tool is presented, ARIBA, that identifies AMR-associated genes and single nucleotide polymorphisms directly from short reads, and generates detailed and customizable output.
Abstract: Antimicrobial resistance (AMR) is one of the major threats to human and animal health worldwide, yet few high-throughput tools exist to analyse and predict the resistance of a bacterial isolate from sequencing data. Here we present a new tool, ARIBA, that identifies AMR-associated genes and single nucleotide polymorphisms directly from short reads, and generates detailed and customizable output. The accuracy and advantages of ARIBA over other tools are demonstrated on three datasets from Gram-positive and Gram-negative bacteria, with ARIBA outperforming existing methods.

Journal ArticleDOI
13 Jul 2017-Cell
TL;DR: The genomes of malaria parasites contain many genes of unknown function and the level of genetic redundancy in a single-celled organism may reflect the degree of environmental variation it experiences, which helps rationalize both the relative successes of drugs and the greater difficulty of making an effective vaccine.