Author
Bruce W. Birren
Other affiliations: Massachusetts Institute of Technology, California Institute of Technology, Bio-Rad Laboratories
Bio: Bruce W. Birren is an academic researcher from Broad Institute. The author has contributed to research in topics: Genome & Gene. The author has an hindex of 103, co-authored 205 publications receiving 113491 citations. Previous affiliations of Bruce W. Birren include Massachusetts Institute of Technology & California Institute of Technology.
Topics: Genome, Gene, Genomics, Population, Human genome
Papers published on a yearly basis
Papers
More filters
••
TL;DR: This analysis demonstrates the potential of a longitudinal genomic surveillance approach to detect resistance-associated loci and improve the mechanistic understanding of how resistance develops and may retard the emergence or spread of ART-R in African parasite populations.
Abstract: Background: Artemisinin-based combination therapies are the first line of treatment for Plasmodium falciparum infections worldwide, but artemisinin resistance (ART-R) has risen rapidly in in Southeast Asia over the last decade. Mutations in kelch13 have been associated with artemisinin (ART) resistance in this region. To explore the power of longitudinal genomic surveillance to detect signals in kelch13 and other loci that contribute to ART or partner drug resistance, we retrospectively sequenced the genomes of 194 P. falciparum isolates from five sites in Northwest Thailand, bracketing the era in which there was a rapid increase in ART-R in this region (2001 -2014). Results: We evaluated statistical metrics for temporal change in the frequency of individual SNPs, assuming that SNPs associated with resistance should increase frequency over this period. After Kelch13-C580Y, the strongest temporal change was seen at a SNP in phosphatidylinositol 4-kinase (PI4K), situated in a pathway recently implicated in the ART-R mechanism. However, other loci exhibit temporal signatures nearly as strong, and warrant further investigation for involvement in ART-R evolution. Through genome-wide association analysis we also identified a variant in a kelch-domain-containing gene on chromosome 10 that may epistatically modulate ART-R. Conclusions: This analysis demonstrates the potential of a longitudinal genomic surveillance approach to detect resistance-associated loci and improve our mechanistic understanding of how resistance develops. Evidence for additional genomic regions outside of the kelch13 locus associated with ART-R parasites may yield new molecular markers for resistance surveillance and may retard the emergence or spread of ART-R in African parasite populations.
5 citations
••
TL;DR: The mission of the TBCAP was to improve the Mtb structural annotations, in particular to resolve the problems of missing genes, incorrect start sites, and poor or conflicting operon definitions in the current publically available databases.
5 citations
••
TL;DR: In this article , the authors used the Zoonomia multispecies alignment to evaluate how historical effective population size (Ne) affects heterozygosity and deleterious genetic load and how these factors may contribute to extinction risk.
Abstract: Species persistence can be influenced by the amount, type, and distribution of diversity across the genome, suggesting a potential relationship between historical demography and resilience. In this study, we surveyed genetic variation across single genomes of 240 mammals that compose the Zoonomia alignment to evaluate how historical effective population size (Ne) affects heterozygosity and deleterious genetic load and how these factors may contribute to extinction risk. We find that species with smaller historical Ne carry a proportionally larger burden of deleterious alleles owing to long-term accumulation and fixation of genetic load and have a higher risk of extinction. This suggests that historical demography can inform contemporary resilience. Models that included genomic data were predictive of species’ conservation status, suggesting that, in the absence of adequate census or ecological data, genomic information may provide an initial risk assessment. Description INTRODUCTION The Anthropocene is marked by an accelerated loss of biodiversity, widespread population declines, and a global conservation crisis. Given limited resources for conservation intervention, an approach is needed to identify threatened species from among the thousands lacking adequate information for status assessments. Such prioritization for intervention could come from genome sequence data, as genomes contain information about demography, diversity, fitness, and adaptive potential. However, the relevance of genomic data for identifying at-risk species is uncertain, in part because genetic variation may reflect past events and life histories better than contemporary conservation status. RATIONALE The Zoonomia multispecies alignment presents an opportunity to systematically compare neutral and functional genomic diversity and their relationships to contemporary extinction risk across a large sample of diverse mammalian taxa. We surveyed 240 species spanning from the “Least Concern” to “Critically Endangered” categories, as published in the International Union for Conservation of Nature’s Red List of Threatened Species. Using a single genome for each species, we estimated historical effective population sizes (Ne) and distributions of genome-wide heterozygosity. To estimate genetic load, we identified substitutions relative to reconstructed ancestral sequences, assuming that mutations at evolutionarily conserved sites and in protein-coding sequences, especially in genes essential for viability in mice, are predominantly deleterious. We examined relationships between the conservation status of species and metrics of heterozygosity, demography, and genetic load and used these data to train and test models to distinguish threatened from nonthreatened species. RESULTS Species with smaller historical Ne are more likely to be categorized as at risk of extinction, suggesting that demography, even from periods more than 10,000 years in the past, may be informative of contemporary resilience. Species with smaller historical Ne also carry proportionally higher burdens of weakly and moderately deleterious alleles, consistent with theoretical expectations of the long-term accumulation and fixation of genetic load under strong genetic drift. We found weak support for a causative link between fixed drift load and extinction risk; however, other types of genetic load not captured in our data, such as rare, highly deleterious alleles, may also play a role. Although ecological (e.g., physiological, life-history, and behavioral) variables were the best predictors of extinction risk, genomic variables nonrandomly distinguished threatened from nonthreatened species in regression and machine learning models. These results suggest that information encoded within even a single genome can provide a risk assessment in the absence of adequate ecological or population census data. CONCLUSION Our analysis highlights the potential for genomic data to rapidly and inexpensively gauge extinction risk by leveraging relationships between contemporary conservation status and genetic variation shaped by the long-term demographic history of species. As more resequencing data and additional reference genomes become available, estimates of genetic load, estimates of recent demographic history, and accuracy of predictive models will improve. We therefore echo calls for including genomic information in assessments of the conservation status of species. Genomic information can help predict extinction risk in diverse mammalian species. Across 240 mammals, species with smaller historical Ne had lower genetic diversity, higher genetic load, and were more likely to be threatened with extinction. Genomic data were used to train models that predict whether a species is threatened, which can be valuable for assessing extinction risk in species lacking ecological or census data. [Animal silhouettes are from PhyloPic]
5 citations
••
TL;DR: The whole-genome sequencing of two extensively drug-resistant tuberculosis strains belonging to the Euro-American S lineage showed single-nucleotide polymorphisms predicted to have drug efflux activity.
Abstract: We report the whole-genome sequencing of two extensively drug-resistant tuberculosis strains belonging to the Euro-American S lineage. The RSA 114 strain showed single-nucleotide polymorphisms predicted to have drug efflux activity.
4 citations
••
Max Planck Society1, University of Oxford2, University of Colorado Boulder3, Michigan State University4, Marine Biological Laboratory5, Argonne National Laboratory6, European Bioinformatics Institute7, Yonsei University8, University of Manchester9, Massachusetts Institute of Technology10, New York University11, National Institutes of Health12, Jacobs University Bremen13, Los Alamos National Laboratory14, University of Maryland, Baltimore15, Ghent University16, Lawrence Berkeley National Laboratory17, University of Southern California18, National Ecological Observatory Network19, Human Genome Sequencing Center20, University of New Mexico21, Washington University in St. Louis22, Joint Genome Institute23, Baylor College of Medicine24, Cornell University25, Cooperative Institute for Research in Environmental Sciences26, Technische Universität München27, J. Craig Venter Institute28, University of Waterloo29, Oak Ridge National Laboratory30, National Science Foundation31, Vrije Universiteit Brussel32, Stanford University33
TL;DR: The Genomic Standards Consortium's (GSC) Minimum Information about an ENvironmental Sequence (MIENS) standard for describing marker genes as discussed by the authors has been adopted for gene annotation.
Abstract: We present the Genomic Standards Consortium’s (GSC) “Minimum Information about an ENvironmental Sequence” (MIENS) standard for describing marker genes. Adoption of MIENS will enhance our ability to analyze natural genetic diversity across the Tree of Life as it is currently being documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere.
4 citations
Cited by
More filters
••
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
22,269 citations
••
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
20,557 citations
28 Jul 2005
TL;DR: PfPMP1)与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作�ly.
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。
18,940 citations
••
TL;DR: SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies.
Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V−SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online (http://bioinf.spbau.ru/spades). It is distributed as open source software.
16,859 citations
••
TL;DR: The Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available, providing a unified solution for transcriptome reconstruction in any sample.
Abstract: Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here we present the Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available. By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes. Compared with other de novo transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels, with a sensitivity similar to methods that rely on genome alignments. Our approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome.
15,665 citations