scispace - formally typeset
Search or ask a question

Showing papers by "Broad Institute published in 2016"


Journal ArticleDOI
Monkol Lek, Konrad J. Karczewski1, Konrad J. Karczewski2, Eric Vallabh Minikel1, Eric Vallabh Minikel2, Kaitlin E. Samocha, Eric Banks1, Timothy Fennell1, Anne H. O’Donnell-Luria3, Anne H. O’Donnell-Luria2, Anne H. O’Donnell-Luria1, James S. Ware, Andrew J. Hill4, Andrew J. Hill2, Andrew J. Hill1, Beryl B. Cummings1, Beryl B. Cummings2, Taru Tukiainen1, Taru Tukiainen2, Daniel P. Birnbaum1, Jack A. Kosmicki, Laramie E. Duncan1, Laramie E. Duncan2, Karol Estrada1, Karol Estrada2, Fengmei Zhao1, Fengmei Zhao2, James Zou1, Emma Pierce-Hoffman1, Emma Pierce-Hoffman2, Joanne Berghout5, David Neil Cooper6, Nicole A. Deflaux7, Mark A. DePristo1, Ron Do, Jason Flannick2, Jason Flannick1, Menachem Fromer, Laura D. Gauthier1, Jackie Goldstein1, Jackie Goldstein2, Namrata Gupta1, Daniel P. Howrigan2, Daniel P. Howrigan1, Adam Kiezun1, Mitja I. Kurki1, Mitja I. Kurki2, Ami Levy Moonshine1, Pradeep Natarajan, Lorena Orozco, Gina M. Peloso2, Gina M. Peloso1, Ryan Poplin1, Manuel A. Rivas1, Valentin Ruano-Rubio1, Samuel A. Rose1, Douglas M. Ruderfer8, Khalid Shakir1, Peter D. Stenson6, Christine Stevens1, Brett Thomas1, Brett Thomas2, Grace Tiao1, María Teresa Tusié-Luna, Ben Weisburd1, Hong-Hee Won9, Dongmei Yu, David Altshuler10, David Altshuler1, Diego Ardissino, Michael Boehnke11, John Danesh12, Stacey Donnelly1, Roberto Elosua, Jose C. Florez1, Jose C. Florez2, Stacey Gabriel1, Gad Getz1, Gad Getz2, Stephen J. Glatt13, Christina M. Hultman14, Sekar Kathiresan, Markku Laakso15, Steven A. McCarroll1, Steven A. McCarroll2, Mark I. McCarthy16, Mark I. McCarthy17, Dermot P.B. McGovern18, Ruth McPherson19, Benjamin M. Neale1, Benjamin M. Neale2, Aarno Palotie, Shaun Purcell8, Danish Saleheen20, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan21, Patrick F. Sullivan14, Jaakko Tuomilehto22, Ming T. Tsuang23, Hugh Watkins17, Hugh Watkins16, James G. Wilson24, Mark J. Daly2, Mark J. Daly1, Daniel G. MacArthur1, Daniel G. MacArthur2 
18 Aug 2016-Nature
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

8,758 citations


Journal ArticleDOI
Daniel J. Klionsky1, Kotb Abdelmohsen2, Akihisa Abe3, Joynal Abedin4  +2519 moreInstitutions (695)
TL;DR: In this paper, the authors present a set of guidelines for the selection and interpretation of methods for use by investigators who aim to examine macro-autophagy and related processes, as well as for reviewers who need to provide realistic and reasonable critiques of papers that are focused on these processes.
Abstract: In 2008 we published the first set of guidelines for standardizing research in autophagy. Since then, research on this topic has continued to accelerate, and many new scientists have entered the field. Our knowledge base and relevant new technologies have also been expanding. Accordingly, it is important to update these guidelines for monitoring autophagy in different organisms. Various reviews have described the range of assays that have been used for this purpose. Nevertheless, there continues to be confusion regarding acceptable methods to measure autophagy, especially in multicellular eukaryotes. For example, a key point that needs to be emphasized is that there is a difference between measurements that monitor the numbers or volume of autophagic elements (e.g., autophagosomes or autolysosomes) at any stage of the autophagic process versus those that measure flux through the autophagy pathway (i.e., the complete process including the amount and rate of cargo sequestered and degraded). In particular, a block in macroautophagy that results in autophagosome accumulation must be differentiated from stimuli that increase autophagic activity, defined as increased autophagy induction coupled with increased delivery to, and degradation within, lysosomes (in most higher eukaryotes and some protists such as Dictyostelium) or the vacuole (in plants and fungi). In other words, it is especially important that investigators new to the field understand that the appearance of more autophagosomes does not necessarily equate with more autophagy. In fact, in many cases, autophagosomes accumulate because of a block in trafficking to lysosomes without a concomitant change in autophagosome biogenesis, whereas an increase in autolysosomes may reflect a reduction in degradative activity. It is worth emphasizing here that lysosomal digestion is a stage of autophagy and evaluating its competence is a crucial part of the evaluation of autophagic flux, or complete autophagy. Here, we present a set of guidelines for the selection and interpretation of methods for use by investigators who aim to examine macroautophagy and related processes, as well as for reviewers who need to provide realistic and reasonable critiques of papers that are focused on these processes. These guidelines are not meant to be a formulaic set of rules, because the appropriate assays depend in part on the question being asked and the system being used. In addition, we emphasize that no individual assay is guaranteed to be the most appropriate one in every situation, and we strongly recommend the use of multiple assays to monitor autophagy. Along these lines, because of the potential for pleiotropic effects due to blocking autophagy through genetic manipulation, it is imperative to target by gene knockout or RNA interference more than one autophagy-related protein. In addition, some individual Atg proteins, or groups of proteins, are involved in other cellular pathways implying that not all Atg proteins can be used as a specific marker for an autophagic process. In these guidelines, we consider these various methods of assessing autophagy and what information can, or cannot, be obtained from them. Finally, by discussing the merits and limits of particular assays, we hope to encourage technical innovation in the field.

5,187 citations


Journal ArticleDOI
08 Apr 2016-Science
TL;DR: The cellular ecosystem of tumors is begin to unravel and how single-cell genomics offers insights with implications for both targeted and immune therapies is unraveled.
Abstract: To explore the distinct genotypic and phenotypic states of melanoma tumors, we applied single-cell RNA sequencing (RNA-seq) to 4645 single cells isolated from 19 patients, profiling malignant, immune, stromal, and endothelial cells. Malignant cells within the same tumor displayed transcriptional heterogeneity associated with the cell cycle, spatial context, and a drug-resistance program. In particular, all tumors harbored malignant cells from two distinct transcriptional cell states, such that tumors characterized by high levels of the MITF transcription factor also contained cells with low MITF and elevated levels of the AXL kinase. Single-cell analyses suggested distinct tumor microenvironmental patterns, including cell-to-cell interactions. Analysis of tumor-infiltrating T cells revealed exhaustion programs, their connection to T cell activation and clonal expansion, and their variability across patients. Overall, we begin to unravel the cellular ecosystem of tumors and how single-cell genomics offers insights with implications for both targeted and immune therapies.

3,061 citations


Journal ArticleDOI
TL;DR: Recently devised sgRNA design rules are used to create human and mouse genome-wide libraries, perform positive and negative selection screens and observe that the use of these rules produced improved results, and a metric to predict off-target sites is developed.
Abstract: CRISPR-Cas9-based genetic screens are a powerful new tool in biology. By simply altering the sequence of the single-guide RNA (sgRNA), one can reprogram Cas9 to target different sites in the genome with relative ease, but the on-target activity and off-target effects of individual sgRNAs can vary widely. Here, we use recently devised sgRNA design rules to create human and mouse genome-wide libraries, perform positive and negative selection screens and observe that the use of these rules produced improved results. Additionally, we profile the off-target activity of thousands of sgRNAs and develop a metric to predict off-target sites. We incorporate these findings from large-scale, empirical data to improve our computational design rules and create optimized sgRNA libraries that maximize on-target activity and minimize off-target effects to enable more effective and efficient genetic screens and genome engineering.

2,866 citations


Journal ArticleDOI
25 Mar 2016-Science
TL;DR: A relationship between clonal neoantigen burden and overall survival in primary lung adenocarcinomas and the impact of neoantigens intratumor heterogeneity (ITH) on antitumor immunity is demonstrated.
Abstract: As tumors grow, they acquire mutations, some of which create neoantigens that influence the response of patients to immune checkpoint inhibitors. We explored the impact of neoantigen intratumor heterogeneity (ITH) on antitumor immunity. Through integrated analysis of ITH and neoantigen burden, we demonstrate a relationship between clonal neoantigen burden and overall survival in primary lung adenocarcinomas. CD8+ tumor-infiltrating lymphocytes reactive to clonal neoantigens were identified in early-stage non–small cell lung cancer and expressed high levels of PD-1. Sensitivity to PD-1 and CTLA-4 blockade in patients with advanced NSCLC and melanoma was enhanced in tumors enriched for clonal neoantigens. T cells recognizing clonal neoantigens were detectable in patients with durable clinical benefit. Cytotoxic chemotherapy–induced subclonal neoantigens, contributing to an increased mutational load, were enriched in certain poor responders. These data suggest that neoantigen heterogeneity may influence immune surveillance and support therapeutic developments targeting clonal neoantigens.

2,284 citations


Journal ArticleDOI
Shane A. McCarthy1, Sayantan Das2, Warren W. Kretzschmar3, Olivier Delaneau4, Andrew R. Wood5, Alexander Teumer6, Hyun Min Kang2, Christian Fuchsberger2, Petr Danecek1, Kevin Sharp3, Yang Luo1, C Sidore7, Alan Kwong2, Nicholas J. Timpson8, Seppo Koskinen, Scott I. Vrieze9, Laura J. Scott2, He Zhang2, Anubha Mahajan3, Jan H. Veldink, Ulrike Peters10, Ulrike Peters11, Carlos N. Pato12, Cornelia M. van Duijn13, Christopher E. Gillies2, Ilaria Gandin14, Massimo Mezzavilla, Arthur Gilly1, Massimiliano Cocca14, Michela Traglia, Andrea Angius7, Jeffrey C. Barrett1, D.I. Boomsma15, Kari Branham2, Gerome Breen16, Gerome Breen17, Chad M. Brummett2, Fabio Busonero7, Harry Campbell18, Andrew T. Chan19, Sai Chen2, Emily Y. Chew20, Francis S. Collins20, Laura J Corbin8, George Davey Smith8, George Dedoussis21, Marcus Dörr6, Aliki-Eleni Farmaki21, Luigi Ferrucci20, Lukas Forer22, Ross M. Fraser2, Stacey Gabriel23, Shawn Levy, Leif Groop24, Leif Groop25, Tabitha A. Harrison11, Andrew T. Hattersley5, Oddgeir L. Holmen26, Kristian Hveem26, Matthias Kretzler2, James Lee27, Matt McGue28, Thomas Meitinger29, David Melzer5, Josine L. Min8, Karen L. Mohlke30, John B. Vincent31, Matthias Nauck6, Deborah A. Nickerson10, Aarno Palotie23, Aarno Palotie19, Michele T. Pato12, Nicola Pirastu14, Melvin G. McInnis2, J. Brent Richards32, J. Brent Richards16, Cinzia Sala, Veikko Salomaa, David Schlessinger20, Sebastian Schoenherr22, P. Eline Slagboom33, Kerrin S. Small16, Tim D. Spector16, Dwight Stambolian34, Marcus A. Tuke5, Jaakko Tuomilehto, Leonard H. van den Berg, Wouter van Rheenen, Uwe Völker6, Cisca Wijmenga35, Daniela Toniolo, Eleftheria Zeggini1, Paolo Gasparini14, Matthew G. Sampson2, James F. Wilson18, Timothy M. Frayling5, Paul I.W. de Bakker36, Morris A. Swertz35, Steven A. McCarroll19, Charles Kooperberg11, Annelot M. Dekker, David Altshuler, Cristen J. Willer2, William G. Iacono28, Samuli Ripatti24, Nicole Soranzo27, Nicole Soranzo1, Klaudia Walter1, Anand Swaroop20, Francesco Cucca7, Carl A. Anderson1, Richard M. Myers, Michael Boehnke2, Mark I. McCarthy3, Mark I. McCarthy37, Richard Durbin1, Gonçalo R. Abecasis2, Jonathan Marchini3 
TL;DR: A reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies.
Abstract: We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies, and it can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.

2,149 citations


Journal ArticleDOI
06 May 2016-Science
TL;DR: In mouse models, the complement-dependent pathway and microglia that prune excess synapses in development are inappropriately activated and mediate synapse loss in AD, which is an early feature of Alzheimer's disease and correlates with cognitive decline.
Abstract: Synapse loss in Alzheimer’s disease (AD) correlates with cognitive decline. Involvement of microglia and complement in AD has been attributed to neuroinflammation, prominent late in disease. Here we show in mouse models that complement and microglia mediate synaptic loss early in AD. C1q, the initiating protein of the classical complement cascade, is increased and associated with synapses before overt plaque deposition. Inhibition of C1q, C3, or the microglial complement receptor CR3 reduces the number of phagocytic microglia, as well as the extent of early synapse loss. C1q is necessary for the toxic effects of soluble β-amyloid (Aβ) oligomers on synapses and hippocampal long-term potentiation. Finally, microglia in adult brains engulf synaptic material in a CR3-dependent process when exposed to soluble Aβ oligomers. Together, these findings suggest that the complement-dependent pathway and microglia that prune excess synapses in development are inappropriately activated and mediate synapse loss in AD.

1,997 citations


Journal ArticleDOI
11 Feb 2016-Nature
TL;DR: It is found that many structurally diverse alleles of the complement component 4 (C4) genes generated widely varying levels of C4A and C4B expression in the brain, with each common C4 allele associating with schizophrenia in proportion to its tendency to generate greater expression of C 4A.
Abstract: Schizophrenia is a heritable brain illness with unknown pathogenic mechanisms. Schizophrenia's strongest genetic association at a population level involves variation in the major histocompatibility complex (MHC) locus, but the genes and molecular mechanisms accounting for this have been challenging to identify. Here we show that this association arises in part from many structurally diverse alleles of the complement component 4 (C4) genes. We found that these alleles generated widely varying levels of C4A and C4B expression in the brain, with each common C4 allele associating with schizophrenia in proportion to its tendency to generate greater expression of C4A. Human C4 protein localized to neuronal synapses, dendrites, axons, and cell bodies. In mice, C4 mediated synapse elimination during postnatal development. These results implicate excessive complement activity in the development of schizophrenia and may help explain the reduced numbers of synapses in the brains of individuals with schizophrenia.

1,826 citations


Journal ArticleDOI
05 Aug 2016-Science
TL;DR: LshC2c2 is a RNA-guided RNase which requires the activity of its two HEPN domains, suggesting previously unidentified mechanisms of RNA targeting and degradation by CRISPR systems.
Abstract: The clustered regularly interspaced short palindromic repeat (CRISPR)-CRISPR-associated genes (Cas) adaptive immune system defends microbes against foreign genetic elements via DNA or RNA-DNA interference. We characterize the class 2 type VI CRISPR-Cas effector C2c2 and demonstrate its RNA-guided ribonuclease function. C2c2 from the bacterium Leptotrichia shahii provides interference against RNA phage. In vitro biochemical analysis shows that C2c2 is guided by a single CRISPR RNA and can be programmed to cleave single-stranded RNA targets carrying complementary protospacers. In bacteria, C2c2 can be programmed to knock down specific mRNAs. Cleavage is mediated by catalytic residues in the two conserved Higher Eukaryotes and Prokaryotes Nucleotide-binding (HEPN) domains, mutations of which generate catalytically inactive RNA-binding proteins. These results broaden our understanding of CRISPR-Cas systems and suggest that C2c2 can be used to develop new RNA-targeting tools.

1,522 citations


Journal ArticleDOI
TL;DR: A powerful strategy that integrates gene expression measurements with summary association statistics from large-scale genome-wide association studies (GWAS) to identify genes whose cis-regulated expression is associated with complex traits is introduced.
Abstract: Many genetic variants influence complex traits by modulating gene expression, thus altering the abundance of one or multiple proteins. Here we introduce a powerful strategy that integrates gene expression measurements with summary association statistics from large-scale genome-wide association studies (GWAS) to identify genes whose cis-regulated expression is associated with complex traits. We leverage expression imputation from genetic data to perform a transcriptome-wide association study (TWAS) to identify significant expression-trait associations. We applied our approaches to expression data from blood and adipose tissue measured in ∼ 3,000 individuals overall. We imputed gene expression into GWAS data from over 900,000 phenotype measurements to identify 69 new genes significantly associated with obesity-related traits (BMI, lipids and height). Many of these genes are associated with relevant phenotypes in the Hybrid Mouse Diversity Panel. Our results showcase the power of integrating genotype, gene expression and phenotype to gain insights into the genetic basis of complex traits.

1,473 citations


Journal ArticleDOI
02 Jun 2016-Nature
TL;DR: It is demonstrated that proteogenomic analysis of breast cancer elucidates functional consequences of somatic mutations, narrows candidate nominations for driver genes within large deletions and amplified regions, and identifies therapeutic targets.
Abstract: Somatic mutations have been extensively characterized in breast cancer, but the effects of these genetic alterations on the proteomic landscape remain poorly understood. Here we describe quantitative mass-spectrometry-based proteomic and phosphoproteomic analyses of 105 genomically annotated breast cancers, of which 77 provided high-quality data. Integrated analyses provided insights into the somatic cancer genome including the consequences of chromosomal loss, such as the 5q deletion characteristic of basal-like breast cancer. Interrogation of the 5q trans-effects against the Library of Integrated Network-based Cellular Signatures, connected loss of CETN3 and SKP1 to elevated expression of epidermal growth factor receptor (EGFR), and SKP1 loss also to increased SRC tyrosine kinase. Global proteomic data confirmed a stromal-enriched group of proteins in addition to basal and luminal clusters, and pathway analysis of the phosphoproteome identified a G-protein-coupled receptor cluster that was not readily identified at the mRNA level. In addition to ERBB2, other amplicon-associated highly phosphorylated kinases were identified, including CDK12, PAK1, PTK2, RIPK2 and TLK2. We demonstrate that proteogenomic analysis of breast cancer elucidates the functional consequences of somatic mutations, narrows candidate nominations for driver genes within large deletions and amplified regions, and identifies therapeutic targets.

Journal ArticleDOI
29 Apr 2016-Science
TL;DR: Deep sequencing of the gut microbiomes of 1135 participants from a Dutch population-based cohort shows relations between the microbiome and 126 exogenous and intrinsic host factors, including 31 intrinsic factors, 12 diseases, 19 drug groups, 4 smoking categories, and 60 dietary factors, and an important step toward a better understanding of environment-diet-microbe-host interactions.
Abstract: Deep sequencing of the gut microbiomes of 1135 participants from a Dutch population-based cohort shows relations between the microbiome and 126 exogenous and intrinsic host factors, including 31 intrinsic factors, 12 diseases, 19 drug groups, 4 smoking categories, and 60 dietary factors. These factors collectively explain 18.7% of the variation seen in the interindividual distance of microbial composition. We could associate 110 factors to 125 species and observed that fecal chromogranin A (CgA), a protein secreted by enteroendocrine cells, was exclusively associated with 61 microbial species whose abundance collectively accounted for 53% of microbial composition. Low CgA concentrations were seen in individuals with a more diverse microbiome. These results are an important step toward a better understanding of environment-diet-microbe-host interactions.

Journal ArticleDOI
TL;DR: A new phasing algorithm, Eagle2, is introduced that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium; HRC) using a new data structure based on the positional Burrows-Wheeler transform.
Abstract: Po-Ru Loh, Alkes Price and colleagues present Eagle2, a reference-based phasing algorithm that allows for highly accurate and efficient phasing of genotypes across a broad range of cohort sizes. They demonstrate an approximately 10% improvement in accuracy and 20% improvement in speed compared to a competing method, SHAPEIT2.

Journal ArticleDOI
TL;DR: Several definitions of a ‘healthy microbiome’ that have emerged are reviewed, the current understanding of the ranges of healthy microbial diversity, and gaps such as the characterization of molecular function and the development of ecological therapies to be addressed in the future are reviewed.
Abstract: Humans are virtually identical in their genetic makeup, yet the small differences in our DNA give rise to tremendous phenotypic diversity across the human population. By contrast, the metagenome of the human microbiome—the total DNA content of microbes inhabiting our bodies—is quite a bit more variable, with only a third of its constituent genes found in a majority of healthy individuals. Understanding this variability in the “healthy microbiome” has thus been a major challenge in microbiome research, dating back at least to the 1960s, continuing through the Human Microbiome Project and beyond. Cataloguing the necessary and sufficient sets of microbiome features that support health, and the normal ranges of these features in healthy populations, is an essential first step to identifying and correcting microbial configurations that are implicated in disease. Toward this goal, several population-scale studies have documented the ranges and diversity of both taxonomic compositions and functional potentials normally observed in the microbiomes of healthy populations, along with possible driving factors such as geography, diet, and lifestyle. Here, we review several definitions of a ‘healthy microbiome’ that have emerged, the current understanding of the ranges of healthy microbial diversity, and gaps such as the characterization of molecular function and the development of ecological therapies to be addressed in the future.

Journal ArticleDOI
Swapan Mallick1, Swapan Mallick2, Swapan Mallick3, Heng Li1, Mark Lipson2, Iain Mathieson2, Melissa Gymrek, Fernando Racimo4, Mengyao Zhao1, Mengyao Zhao2, Mengyao Zhao3, Niru Chennagiri1, Niru Chennagiri2, Niru Chennagiri3, Susanne Nordenfelt1, Susanne Nordenfelt2, Susanne Nordenfelt3, Arti Tandon2, Arti Tandon1, Pontus Skoglund2, Pontus Skoglund1, Iosif Lazaridis2, Iosif Lazaridis1, Sriram Sankararaman2, Sriram Sankararaman5, Sriram Sankararaman1, Qiaomei Fu6, Qiaomei Fu1, Qiaomei Fu2, Nadin Rohland2, Nadin Rohland1, Gabriel Renaud7, Yaniv Erlich8, Thomas Willems9, Carla Gallo10, Jeffrey P. Spence4, Yun S. Song11, Yun S. Song4, Giovanni Poletti10, Francois Balloux12, George van Driem13, Peter de Knijff14, Irene Gallego Romero15, Aashish R. Jha16, Doron M. Behar17, Claudio M. Bravi18, Cristian Capelli19, Tor Hervig20, Andrés Moreno-Estrada, Olga L. Posukh21, Elena Balanovska, Oleg Balanovsky22, Sena Karachanak-Yankova23, Hovhannes Sahakyan17, Hovhannes Sahakyan24, Draga Toncheva23, Levon Yepiskoposyan24, Chris Tyler-Smith25, Yali Xue25, M. Syafiq Abdullah26, Andres Ruiz-Linares12, Cynthia M. Beall27, Anna Di Rienzo16, Choongwon Jeong16, Elena B. Starikovskaya, Ene Metspalu28, Ene Metspalu17, Jüri Parik17, Richard Villems29, Richard Villems28, Richard Villems17, Brenna M. Henn30, Ugur Hodoglugil31, Robert W. Mahley32, Antti Sajantila33, George Stamatoyannopoulos34, Joseph Wee, Rita Khusainova35, Elza Khusnutdinova35, Sergey Litvinov17, Sergey Litvinov35, George Ayodo36, David Comas37, Michael F. Hammer38, Toomas Kivisild39, Toomas Kivisild17, William Klitz, Cheryl A. Winkler40, Damian Labuda41, Michael J. Bamshad34, Lynn B. Jorde42, Sarah A. Tishkoff11, W. Scott Watkins42, Mait Metspalu17, Stanislav Dryomov, Rem I. Sukernik43, Lalji Singh44, Lalji Singh5, Kumarasamy Thangaraj44, Svante Pääbo7, Janet Kelso7, Nick Patterson1, David Reich1, David Reich2, David Reich3 
13 Oct 2016-Nature
TL;DR: It is demonstrated that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans; instead, their modern human ancestry is consistent with coming from the same source as that of other non-Africans.
Abstract: Here we report the Simons Genome Diversity Project data set: high quality genomes from 300 individuals from 142 diverse populations. These genomes include at least 5.8 million base pairs that are not present in the human reference genome. Our analysis reveals key features of the landscape of human genome variation, including that the rate of accumulation of mutations has accelerated by about 5% in non-Africans compared to Africans since divergence. We show that the ancestors of some pairs of present-day human populations were substantially separated by 100,000 years ago, well before the archaeologically attested onset of behavioural modernity. We also demonstrate that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans; instead, their modern human ancestry is consistent with coming from the same source as that of other non-Africans.

Journal ArticleDOI
Aysu Okbay1, Jonathan P. Beauchamp2, Mark Alan Fontana3, James J. Lee4  +293 moreInstitutions (81)
26 May 2016-Nature
TL;DR: In this article, the results of a genome-wide association study (GWAS) for educational attainment were reported, showing that single-nucleotide polymorphisms associated with educational attainment disproportionately occur in genomic regions regulating gene expression in the fetal brain.
Abstract: Educational attainment is strongly influenced by social and other environmental factors, but genetic factors are estimated to account for at least 20% of the variation across individuals. Here we report the results of a genome-wide association study (GWAS) for educational attainment that extends our earlier discovery sample of 101,069 individuals to 293,723 individuals, and a replication study in an independent sample of 111,349 individuals from the UK Biobank. We identify 74 genome-wide significant loci associated with the number of years of schooling completed. Single-nucleotide polymorphisms associated with educational attainment are disproportionately found in genomic regions regulating gene expression in the fetal brain. Candidate genes are preferentially expressed in neural tissue, especially during the prenatal period, and enriched for biological pathways involved in neural development. Our findings demonstrate that, even for a behavioural phenotype that is mostly environmentally determined, a well-powered GWAS identifies replicable associated genetic variants that suggest biologically relevant pathways. Because educational attainment is measured in large numbers of individuals, it will continue to be useful as a proxy phenotype in efforts to characterize the genetic influences of related phenotypes, including cognition and neuropsychiatric diseases.

Journal ArticleDOI
TL;DR: The improved MitoCarta 2.0 inventory provides a molecular framework for system-level analysis of mammalian mitochondria and helps to understand mitochondrial pathways in health and disease.
Abstract: Mitochondria are complex organelles that house essential pathways involved in energy metabolism, ion homeostasis, signalling and apoptosis. To understand mitochondrial pathways in health and disease, it is crucial to have an accurate inventory of the organelle's protein components. In 2008, we made substantial progress toward this goal by performing in-depth mass spectrometry of mitochondria from 14 organs, epitope tagging/microscopy and Bayesian integration to assemble MitoCarta (www.broadinstitute.org/pubs/MitoCarta): an inventory of genes encoding mitochondrial-localized proteins and their expression across 14 mouse tissues. Using the same strategy we have now reconstructed this inventory separately for human and for mouse based on (i) improved gene transcript models, (ii) updated literature curation, including results from proteomic analyses of mitochondrial sub-compartments, (iii) improved homology mapping and (iv) updated versions of all seven original data sets. The updated human MitoCarta2.0 consists of 1158 human genes, including 918 genes in the original inventory as well as 240 additional genes. The updated mouse MitoCarta2.0 consists of 1158 genes, including 967 genes in the original inventory plus 191 additional genes. The improved MitoCarta 2.0 inventory provides a molecular framework for system-level analysis of mammalian mitochondria.

Journal ArticleDOI
TL;DR: Analysis of the tumour immune microenvironment in the context of anti-PD-1 therapy in two fully immunocompetent mouse models of lung adenocarcinoma suggests that upregulation of TIM-3 and other immune checkpoints may be targetable biomarkers associated with adaptive resistance to PD-1 blockade.
Abstract: Despite compelling antitumour activity of antibodies targeting the programmed death 1 (PD-1): programmed death ligand 1 (PD-L1) immune checkpoint in lung cancer, resistance to these therapies has increasingly been observed. In this study, to elucidate mechanisms of adaptive resistance, we analyse the tumour immune microenvironment in the context of anti-PD-1 therapy in two fully immunocompetent mouse models of lung adenocarcinoma. In tumours progressing following response to anti-PD-1 therapy, we observe upregulation of alternative immune checkpoints, notably T-cell immunoglobulin mucin-3 (TIM-3), in PD-1 antibody bound T cells and demonstrate a survival advantage with addition of a TIM-3 blocking antibody following failure of PD-1 blockade. Two patients who developed adaptive resistance to anti-PD-1 treatment also show a similar TIM-3 upregulation in blocking antibody-bound T cells at treatment failure. These data suggest that upregulation of TIM-3 and other immune checkpoints may be targetable biomarkers associated with adaptive resistance to PD-1 blockade.

Journal ArticleDOI
Heng Li1
TL;DR: A new mapper, minimap and a de novo assembler, miniasm, is presented for efficiently mapping and assembling SMRT and ONT reads without an error correction stage.
Abstract: Motivation: Single Molecule Real-Time (SMRT) sequencing technology and Oxford Nanopore technologies (ONT) produce reads over 10 kb in length, which have enabled high-quality genome assembly at an affordable cost. However, at present, long reads have an error rate as high as 10–15%. Complex and computationally intensive pipelines are required to assemble such reads. Results: We present a new mapper, minimap and a de novo assembler, miniasm, for efficiently mapping and assembling SMRT and ONT reads without an error correction stage. They can often assemble a sequencing run of bacterial data into a single contig in a few minutes, and assemble 45-fold Caenorhabditis elegans data in 9 min, orders of magnitude faster than the existing pipelines, though the consensus sequence error rate is as high as raw reads. We also introduce a pairwise read mapping format and a graphical fragment assembly format, and demonstrate the interoperability between ours and current tools. Availability and implementation: https://github.com/lh3/minimap and https://github.com/lh3/miniasm Contact: gro.etutitsnidaorb@ilgneh Supplementary information: Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
TL;DR: A droplet-based, single-cell RNA-seq method is implemented to determine the transcriptomes of over 12,000 individual pancreatic cells from four human donors and two mouse strains and provides a resource for the discovery of novel cell type-specific transcription factors, signaling receptors, and medically relevant genes.
Abstract: Although the function of the mammalian pancreas hinges on complex interactions of distinct cell types, gene expression profiles have primarily been described with bulk mixtures. Here we implemented a droplet-based, single-cell RNA-seq method to determine the transcriptomes of over 12,000 individual pancreatic cells from four human donors and two mouse strains. Cells could be divided into 15 clusters that matched previously characterized cell types: all endocrine cell types, including rare epsilon-cells; exocrine cell types; vascular cells; Schwann cells; quiescent and activated stellate cells; and four types of immune cells. We detected subpopulations of ductal cells with distinct expression profiles and validated their existence with immuno-histochemistry stains. Moreover, among human beta- cells, we detected heterogeneity in the regulation of genes relating to functional maturation and levels of ER stress. Finally, we deconvolved bulk gene expression samples using the single-cell data to detect disease-associated differential expression. Our dataset provides a resource for the discovery of novel cell type-specific transcription factors, signaling receptors, and medically relevant genes.


Journal ArticleDOI
07 Jan 2016-Nature
TL;DR: Human IDH mutant gliomas exhibit hypermethylation at cohesin and CCCTC-binding factor (CTCF)-binding sites, compromising binding of this methylation-sensitive insulator protein, and manifest a CpG island methylator phenotype (G-CIMP), although the functional importance of this altered epigenetic state remains unclear.
Abstract: Gain-of-function IDH mutations are initiating events that define major clinical and prognostic classes of gliomas. Mutant IDH protein produces a new onco-metabolite, 2-hydroxyglutarate, which interferes with iron-dependent hydroxylases, including the TET family of 5'-methylcytosine hydroxylases. TET enzymes catalyse a key step in the removal of DNA methylation. IDH mutant gliomas thus manifest a CpG island methylator phenotype (G-CIMP), although the functional importance of this altered epigenetic state remains unclear. Here we show that human IDH mutant gliomas exhibit hypermethylation at cohesin and CCCTC-binding factor (CTCF)-binding sites, compromising binding of this methylation-sensitive insulator protein. Reduced CTCF binding is associated with loss of insulation between topological domains and aberrant gene activation. We specifically demonstrate that loss of CTCF at a domain boundary permits a constitutive enhancer to interact aberrantly with the receptor tyrosine kinase gene PDGFRA, a prominent glioma oncogene. Treatment of IDH mutant gliomaspheres with a demethylating agent partially restores insulator function and downregulates PDGFRA. Conversely, CRISPR-mediated disruption of the CTCF motif in IDH wild-type gliomaspheres upregulates PDGFRA and increases proliferation. Our study suggests that IDH mutations promote gliomagenesis by disrupting chromosomal topology and allowing aberrant regulatory interactions that induce oncogene expression.

Journal ArticleDOI
25 Aug 2016-Cell
TL;DR: This work provides a systematic methodology for achieving comprehensive molecular classification of neurons, identifies novel neuronal types, and uncovers transcriptional differences that distinguish types within a class.

Journal ArticleDOI
17 Nov 2016-Nature
TL;DR: Cross-talk among neighbouring genes is a prevalent phenomenon that can involve multiple mechanisms and cis-regulatory signals, including a role for RNA splice sites, and mechanisms may explain the function and evolution of some genomic loci that produce lncRNAs and broadly contribute to the regulation of both coding and non-coding genes.
Abstract: Mammalian genomes are pervasively transcribed to produce thousands of long non-coding RNAs (lncRNAs) A few of these lncRNAs have been shown to recruit regulatory complexes through RNA-protein interactions to influence the expression of nearby genes, and it has been suggested that many other lncRNAs can also act as local regulators Such local functions could explain the observation that lncRNA expression is often correlated with the expression of nearby genes However, these correlations have been challenging to dissect and could alternatively result from processes that are not mediated by the lncRNA transcripts themselves For example, some gene promoters have been proposed to have dual functions as enhancers, and the process of transcription itself may contribute to gene regulation by recruiting activating factors or remodelling nucleosomes Here we use genetic manipulation in mouse cell lines to dissect 12 genomic loci that produce lncRNAs and find that 5 of these loci influence the expression of a neighbouring gene in cis Notably, none of these effects requires the specific lncRNA transcripts themselves and instead involves general processes associated with their production, including enhancer-like activity of gene promoters, the process of transcription, and the splicing of the transcript Furthermore, such effects are not limited to lncRNA loci: we find that four out of six protein-coding loci also influence the expression of a neighbour These results demonstrate that cross-talk among neighbouring genes is a prevalent phenomenon that can involve multiple mechanisms and cis-regulatory signals, including a role for RNA splice sites These mechanisms may explain the function and evolution of some genomic loci that produce lncRNAs and broadly contribute to the regulation of both coding and non-coding genes

Journal ArticleDOI
Mary E. Dickinson, Ann M. Flenniken, Xiao Ji1, Lydia Teboul2, Michael D. Wong, Jacqueline K. White3, Terrence F. Meehan4, Wolfgang Weninger5, Henrik Westerberg2, Hibret A. Adissu6, Candice N. Baker, Lynette Bower7, James M. Brown2, L. Brianna Caddle, Francesco Chiani8, Dave Clary7, James Cleak2, Mark J. Daly9, James M. Denegre, Brendan Doe3, Mary E. Dolan, Sarah M. Edie, Helmut Fuchs, Valerie Gailus-Durner, Antonella Galli3, Alessia Gambadoro8, Juan Gallegos10, Shiying Guo11, Neil R. Horner2, Chih-Wei Hsu, Sara Johnson2, Sowmya Kalaga, Lance C. Keith, Louise Lanoue7, Thomas N. Lawson2, Monkol Lek12, Monkol Lek9, Manuel Mark13, Susan Marschall, Jeremy Mason4, Melissa L. McElwee, Susan Newbigging6, Lauryl M. J. Nutter6, Kevin A. Peterson, Ramiro Ramirez-Solis3, Douglas J. Rowland7, Edward Ryder3, Kaitlin E. Samocha12, Kaitlin E. Samocha9, John R. Seavitt10, Mohammed Selloum13, Zsombor Szoke-Kovacs2, Masaru Tamura, Amanda G. Trainor7, Ilinca Tudose4, Shigeharu Wakana, Jonathan Warren4, Olivia Wendling13, David B. West14, Leeyean Wong, Atsushi Yoshiki, Daniel G. MacArthur12, Daniel G. MacArthur9, Glauco P. Tocchini-Valentini8, Xiang Gao11, Paul Flicek4, Allan Bradley3, William C. Skarnes3, Monica J. Justice, Helen Parkinson4, Mark W. Moore, Sara Wells2, Robert E. Braun, Karen L. Svenson, Martin Hrabé de Angelis15, Yann Herault13, Timothy J. Mohun16, Ann-Marie Mallon2, R. Mark Henkelman, Steve D.M. Brown2, David J. Adams3, Kevin C K Lloyd7, Colin McKerlie6, Arthur L. Beaudet10, Maja Bucan1, Stephen A. Murray 
22 Sep 2016-Nature
TL;DR: It is shown that human disease genes are enriched for essential genes, thus providing a dataset that facilitates the prioritization and validation of mutations identified in clinical sequencing efforts and reveals that incomplete penetrance and variable expressivity are common even on a defined genetic background.
Abstract: Approximately one-third of all mammalian genes are essential for life. Phenotypes resulting from knockouts of these genes in mice have provided tremendous insight into gene function and congenital disorders. As part of the International Mouse Phenotyping Consortium effort to generate and phenotypically characterize 5,000 knockout mouse lines, here we identify 410 lethal genes during the production of the first 1,751 unique gene knockouts. Using a standardized phenotyping platform that incorporates high-resolution 3D imaging, we identify phenotypes at multiple time points for previously uncharacterized genes and additional phenotypes for genes with previously reported mutant phenotypes. Unexpectedly, our analysis reveals that incomplete penetrance and variable expressivity are common even on a defined genetic background. In addition, we show that human disease genes are enriched for essential genes, thus providing a dataset that facilitates the prioritization and validation of mutations identified in clinical sequencing efforts.

Journal ArticleDOI
TL;DR: It is shown that schizophrenia is polygenic and the utility of this resource of gene expression and its genetic regulation for mechanistic interpretations of genetic liability for brain diseases is highlighted.
Abstract: Over 100 genetic loci harbor schizophrenia associated variants, yet how these variants confer liability is uncertain. The CommonMind Consortium sequenced RNA from dorsolateral prefrontal cortex of schizophrenia cases (N = 258) and control subjects (N = 279), creating a resource of gene expression and its genetic regulation. Using this resource, ~20% of schizophrenia loci have variants that could contribute to altered gene expression and liability. In five loci, only a single gene was involved: FURIN, TSNARE1, CNTN4, CLCN3, or SNAP91. Altering expression of FURIN, TSNARE1, or CNTN4 changes neurodevelopment in zebrafish; knockdown of FURIN in human neural progenitor cells yields abnormal migration. Of 693 genes showing significant case/control differential expression, their fold changes are ≤ 1.33, and an independent cohort yields similar results. Gene co-expression implicates a network relevant for schizophrenia. Our findings show schizophrenia is polygenic and highlight the utility of this resource for mechanistic interpretations of genetic liability for brain diseases.

Journal ArticleDOI
22 Jan 2016-Science
TL;DR: In this paper, an adeno-associated virus was used to deliver the clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 system to the mdx mouse model of Duchenne muscular dystrophy (DMD) to remove the mutated exon 23 from the dystrophin gene.
Abstract: Duchenne muscular dystrophy (DMD) is a devastating disease affecting about 1 out of 5000 male births and caused by mutations in the dystrophin gene. Genome editing has the potential to restore expression of a modified dystrophin gene from the native locus to modulate disease progression. In this study, adeno-associated virus was used to deliver the clustered regularly interspaced short palindromic repeats (CRISPR)–Cas9 system to the mdx mouse model of DMD to remove the mutated exon 23 from the dystrophin gene. This includes local and systemic delivery to adult mice and systemic delivery to neonatal mice. Exon 23 deletion by CRISPR-Cas9 resulted in expression of the modified dystrophin gene, partial recovery of functional dystrophin protein in skeletal myofibers and cardiac muscle, improvement of muscle biochemistry, and significant enhancement of muscle force. This work establishes CRISPR-Cas9–based genome editing as a potential therapy to treat DMD.


Journal ArticleDOI
TL;DR: Improvements in economics, resolution, and ease of use make CEL-Sequ2 uniquely suited to single-cell RNA-Seq analysis in terms of economics,resolution, and easing of use.
Abstract: Single-cell transcriptomics requires a method that is sensitive, accurate, and reproducible. Here, we present CEL-Seq2, a modified version of our CEL-Seq method, with threefold higher sensitivity, lower costs, and less hands-on time. We implemented CEL-Seq2 on Fluidigm’s C1 system, providing its first single-cell, on-chip barcoding method, and we detected gene expression changes accompanying the progression through the cell cycle in mouse fibroblast cells. We also compare with Smart-Seq to demonstrate CEL-Seq2’s increased sensitivity relative to other available methods. Collectively, the improvements make CEL-Seq2 uniquely suited to single-cell RNA-Seq analysis in terms of economics, resolution, and ease of use.

Journal ArticleDOI
11 Jul 2016-Nature
TL;DR: In this paper, the authors performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing for 12,940 individuals from five ancestry groups.
Abstract: The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes.