scispace - formally typeset
Search or ask a question

Showing papers by "Bruce W. Birren published in 2023"


Journal ArticleDOI
09 Mar 2023-Science
TL;DR: In this paper , the authors explore the evolution of placental mammals, including humans, through reference-free whole-genome alignment of 240 species and protein-coding alignments for 428 species and estimate 10.7% of the human genome is evolutionarily constrained.
Abstract: Evolutionary constraint and acceleration are powerful, cell-type agnostic measures of functional importance. Previous studies in mammals were limited by species number and reliance on human-referenced alignments. We explore the evolution of placental mammals, including humans, through reference-free whole-genome alignment of 240 species and protein-coding alignments for 428 species. We estimate 10.7% of the human genome is evolutionarily constrained. We resolve constraint to single nucleotides, pinpointing functional positions, and refine and expand by over seven-fold the catalog of ultraconserved elements. Overall, 48.5% of constrained bases are as yet unannotated, suggesting yet-to-be-discovered functional importance. Using species-level phenotypes and an updated phylogeny, we associate coding and regulatory variation with olfaction and hibernation. Focusing on biodiversity conservation, we identify genomic metrics that predict species at risk of extinction.

18 citations


Journal ArticleDOI
Patrick F. Sullivan, Jennifer R. S. Meadows, Steven Gazal, BaDoi N. Phan, Gregory R. Andrews, Sharadha Sakthikumar, Jessika Nordin, Ananya Roy, Chao Wang, James Xue, Shuyang Yao, Quan Sun, Jin P. Szatkiewicz, Jia Wen, Laura M. Huckins, Zhili Zheng, Jian Zeng, Naomi R. Wray, Yun Li, Jessica S. Johnson, Jiawen Chen, Benedict Paten, Zhiping Weng, Andreas R. Pfenning, Elinor K. Karlsson, Joel C. Armstrong, Matteo Bianchi, Bruce W. Birren, Kevin R. Bredemeyer, Ana M Breit, Matthew J. Christmas, Hiram Clawson, Joana Damas, Federica Di Palma, Mark Diekhans, Michael X. Dong, Eduardo Eizirik, Kaili Fan, Cornelia E. Fanter, Nicole M. Foley, Karin Forsberg-Nilsson, John Gatesy, Diane P. Genereux, Linda Goodman, Jenna R. Grimshaw, Michaela K. Halsey, Andrew J. Harris, Glenn Hickey, Michael Hiller, Allyson Hindle, Robert Hubley, Graham M. Hughes, Jeremy A. Johnson, David Juan, Irene M. Kaplow, Kathleen C. Keough, Bogdan M. Kirilenko, Klaus-Peter Koepfli, Jennifer M. Korstian, Amanda Kowalczyk, Sergey V. Kozyrev, Alyssa J. Lawler, Colleen Lawless, Thomas Lehmann, Daniel Lévesque, Harris A. Lewin, Xue Li, Abigail L. Lind, Kerstin Lindblad-Toh, Ava Mackay-Smith, Voichita D. Marinescu, Tomas Marques-Bonet, Victor C. Mason, Wynn K. Meyer, Jill Moore, Lucas R. Moreira, Diana D. Moreno-Santillán, Kathleen Morrill, Gerard Muntané, William J. Murphy, Arcadi Navarro, Martin T. Nweeia, Sylvia Ortmann, Austin B. Osmanski, Nicole Paulat, Katherine S. Pollard, Henry Pratt, David A. Ray, Steven K. Reilly, Jeb Rosen, Irina Ruf, Louise Ryan, Oliver A. Ryder, Pardis C. Sabeti, Daniel E. Schäffer, Aitor Serres, Beth Shapiro, Arian F.A. Smit, Mark S. Springer, Chaitanya Srinivasan, Cynthia C. Steiner, Jessica M. Storer, Kevin A.M. Sullivan, Elisabeth Sundström, Megan A. Supple, Ross Swofford, Joy-El R B Talbot, Emma C. Teeling, Jason Turner-Maier, Alejandro Valenzuela, Franziska Wagner, Ola Wallerman, Juehan Wang, Aryn P. Wilder, Morgan Wirthlin, Xiaomeng Zhang 
10 Mar 2023-Science
TL;DR: In this article , single base phyloP scores from the whole genome alignment of 240 placental mammals identified 3.5% of the human genome as significantly constrained, and likely functional.
Abstract: Although thousands of genomic regions have been associated with heritable human diseases, attempts to elucidate biological mechanisms are impeded by a general inability to discern which genomic positions are functionally important. Evolutionary constraint is a powerful predictor of function that is agnostic to cell type or disease mechanism. Here, single base phyloP scores from the whole genome alignment of 240 placental mammals identified 3.5% of the human genome as significantly constrained, and likely functional. We compared these scores to large-scale genome annotation, genome-wide association studies (GWAS), copy number variation, clinical genetics findings, and cancer data sets. Evolutionarily constrained positions are enriched for variants explaining common disease heritability (more than any other functional annotation). Our results improve variant annotation but also highlight that the regulatory landscape of the human genome still needs to be further explored and linked to disease.

13 citations


Journal ArticleDOI
Patrick F. Sullivan, BaDoi N. Phan, Xue Li, Diane P. Genereux, Michael X. Dong, Sharadha Sakthikumar, Jessika Nordin, Ananya Roy, Voichita D. Marinescu, Chao Wang, Ola Wallerman, Shuyang Yao, Quan Sun, Jin P. Szatkiewicz, Jia Wen, Laura M. Huckins, Zhili Zheng, Jian Zeng, Naomi R. Wray, Yun Li, Jessica S. Johnson, Jiawen Chen, Steven K. Reilly, Graham M. Hughes, Andreas R. Pfenning, Kerstin Lindblad-Toh, Gregory R. Andrews, Joel C. Armstrong, Matteo Bianchi, Bruce W. Birren, Kevin R. Bredemeyer, Ana M Breit, Matthew J. Christmas, Hiram Clawson, Joana Damas, Federica Di Palma, Mark Diekhans, Eduardo Eizirik, Kaili Fan, Cornelia E. Fanter, Nicole M. Foley, Karin Forsberg-Nilsson, John Gatesy, Steven Gazal, Linda Goodman, Jenna R. Grimshaw, Michaela K. Halsey, Andrew J. Harris, Glenn Hickey, Michael Hiller, Allyson Hindle, Robert Hubley, Jeremy A. Johnson, David Juan, Irene M. Kaplow, Elinor K. Karlsson, Kathleen C. Keough, Bogdan M. Kirilenko, Klaus-Peter Koepfli, Jennifer M. Korstian, Amanda Kowalczyk, Sergey V. Kozyrev, Alyssa J. Lawler, Colleen Lawless, Thomas Lehmann, Daniel Lévesque, Harris A. Lewin, Abigail L. Lind, Ava Mackay-Smith, Tomas Marques-Bonet, Victor C. Mason, Jennifer R. S. Meadows, Wynn K. Meyer, Jill Moore, Lucas R. Moreira, Diana D. Moreno-Santillán, Kathleen Morrill, Gerard Muntané, William J. Murphy, Arcadi Navarro, Martin T. Nweeia, Sylvia Ortmann, Austin B. Osmanski, Benedict Paten, Nicole Paulat, Katherine S. Pollard, Henry Pratt, David A. Ray, Jeb Rosen, Irina Ruf, Louise Ryan, Oliver A. Ryder, Pardis C. Sabeti, Daniel E. Schäffer, Aitor Serres, Beth Shapiro, Arian F.A. Smit, Mark S. Springer, Chaitanya Srinivasan, Cynthia C. Steiner, Jessica M. Storer, Kevin A.M. Sullivan, Elisabeth Sundström, Megan A. Supple, Ross Swofford, Joy-El R B Talbot, Emma C. Teeling, Jason Turner-Maier, Alejandro Valenzuela, Franziska Wagner, Juehan Wang, Zhiping Weng, Aryn P. Wilder, Morgan Wirthlin, James Xue, Xiaomeng Zhang 
28 Apr 2023-Science
TL;DR: In this article , single-base phyloP scores from 240 mammals identified 3.3% of the human genome as significantly constrained and likely functional and enriched for variants that explain common disease heritability more than other functional annotations.
Abstract: Thousands of genomic regions have been associated with heritable human diseases, but attempts to elucidate biological mechanisms are impeded by an inability to discern which genomic positions are functionally important. Evolutionary constraint is a powerful predictor of function, agnostic to cell type or disease mechanism. Single-base phyloP scores from 240 mammals identified 3.3% of the human genome as significantly constrained and likely functional. We compared phyloP scores to genome annotation, association studies, copy-number variation, clinical genetics findings, and cancer data. Constrained positions are enriched for variants that explain common disease heritability more than other functional annotations. Our results improve variant annotation but also highlight that the regulatory landscape of the human genome still needs to be further explored and linked to disease.

11 citations


Journal ArticleDOI
Gregory R. Andrews, Nishigandha N. Phalke, Elinor K. Karlsson, Steven Gazal, Joel C. Armstrong, Matteo Bianchi, Bruce W. Birren, Kevin R. Bredemeyer, Ana M Breit, Matthew J. Christmas, Hiram Clawson, Joana Damas, Federica Di Palma, Mark Diekhans, Michael X. Dong, Eduardo Eizirik, Kaili Fan, Cornelia E. Fanter, Nicole M. Foley, Karin Forsberg-Nilsson, John Gatesy, Diane P. Genereux, Linda Goodman, Jenna R. Grimshaw, Michaela K. Halsey, Andrew J. Harris, Glenn Hickey, Michael Hiller, Allyson Hindle, Robert Hubley, Graham M. Hughes, Jeremy A. Johnson, David Juan, Irene M. Kaplow, Kathleen C. Keough, Bogdan M. Kirilenko, Klaus-Peter Koepfli, Jennifer M. Korstian, Amanda Kowalczyk, Sergey V. Kozyrev, Alyssa J. Lawler, Colleen Lawless, Thomas Lehmann, Daniel Lévesque, Harris A. Lewin, Xue Li, Abigail L. Lind, Kerstin Lindblad-Toh, Ava Mackay-Smith, Voichita D. Marinescu, Tomas Marques-Bonet, Victor C. Mason, Jennifer R. S. Meadows, Wynn K. Meyer, Jill Moore, Lucas R. Moreira, Diana D. Moreno-Santillán, Kathleen Morrill, Gerard Muntané, William J. Murphy, Arcadi Navarro, Martin T. Nweeia, Sylvia Ortmann, Austin B. Osmanski, Benedict Paten, Nicole Paulat, Andreas R. Pfenning, BaDoi N. Phan, Katherine S. Pollard, Henry Pratt, David A. Ray, Steven K. Reilly, Jeb Rosen, Irina Ruf, Louise Ryan, Oliver A. Ryder, Pardis C. Sabeti, Daniel E. Schäffer, Aitor Serres, Beth Shapiro, Arian F.A. Smit, Mark S. Springer, Chaitanya Srinivasan, Cynthia C. Steiner, Jessica M. Storer, Kevin A.M. Sullivan, Patrick F. Sullivan, Elisabeth Sundström, Megan A. Supple, Ross Swofford, Joy-El R B Talbot, Emma C. Teeling, Jason Turner-Maier, Alejandro Valenzuela, Franziska Wagner, Ola Wallerman, Chao Wang, Juehan Wang, Zhiping Weng, Aryn P. Wilder, Morgan Wirthlin, James Xue, Xiaomeng Zhang 
28 Apr 2023-Science
TL;DR: In this paper , the evolutionary trajectories for 0.92 million human candidate cis-regulatory elements (cCREs) and 15.6 million human transcription factor binding sites (TFBSs) were charted using reference-free alignment across 241 mammalian genomes produced by the Zoonomia Consortium.
Abstract: Understanding the regulatory landscape of the human genome is a long-standing objective of modern biology. Using the reference-free alignment across 241 mammalian genomes produced by the Zoonomia Consortium, we charted evolutionary trajectories for 0.92 million human candidate cis-regulatory elements (cCREs) and 15.6 million human transcription factor binding sites (TFBSs). We identified 439,461 cCREs and 2,024,062 TFBSs under evolutionary constraint. Genes near constrained elements perform fundamental cellular processes, whereas genes near primate-specific elements are involved in environmental interaction, including odor perception and immune response. About 20% of TFBSs are transposable element–derived and exhibit intricate patterns of gains and losses during primate evolution whereas sequence variants associated with complex traits are enriched in constrained TFBSs. Our annotations illuminate the regulatory functions of the human genome. Description INTRODUCTION Mammals, including humans, achieve high levels of organismal complexity largely due to how their proteins are regulated; characterizing the regulatory landscape of the human genome is a longstanding goal of modern biology. Contemporary approaches measure genome-wide biochemical signals, including chromatin accessibility, histone modifications, DNA methylation, and binding of ~1600 transcription factors (TFs) by the human genome. Using these methods, the ENCODE consortium defined almost one million candidate cis-regulatory elements (cCREs). Another approach uses evolutionary conservation to identify potential regulatory regions. We combine these approaches, examining how different functional classes of regulatory elements respond to evolutionary pressures. RATIONALE cCREs tend to be conserved and cCRE classes exhibit varying levels of conservation, suggesting interesting evolutionary dynamics. We examine these dynamics in placental mammals using tools developed by the Zoonomia project: the evolutionary constraint in placental mammals and the reference-free 241-genome alignment. We identify the human cCREs and transcription factor binding sites (TFBSs) conserved in the mammalian lineage, characterize the evolutionary histories of cCREs and TFBSs and identify the driving forces behind their gains and losses and—using biochemical and epigenomic data—assess the likelihood that conserved cCREs and TFBSs are functional in humans and other mammals. RESULTS We explored the ENCODE cCREs derived from epigenomic data and the binding sites of 367 TFs from chromatin immunoprecipitation data. We found a spectrum of mammalian conservation for regulatory elements: on one end lies the highly conserved cCREs and constrained TFBSs, and on the other are primate-specific cCREs and TFBSs overlapping transposable elements (TEs). Conserved elements predominate near genes that function in fundamental cellular processes (metabolism, development) and tend to be functional in other mammalian genomes whereas unconstrained elements lie near genes involved in interaction with the environment. We identified ~439 thousand deeply conserved cCREs (47.5% of cCREs and 4% of the human genome) and 2 million TFBSs (0.8% of the human genome) under mammalian constraint. Using a panel of 69 genome-wide association studies, we found that conserved cCREs and constrained TFBSs achieved high heritability enrichment, demonstrating their utility for functional interpretation of human genetic variants. Meanwhile, more than 85% of primate-specific TFBSs—representing more than 20% of all TFBSs—are derived from TEs. Phylogenetic analysis revealed a staggering number of TFBS clusters sharing patterns of presence and absence across primate genomes and enrichment in specific TE families, suggesting that multiple waves of TE insertion spread these TFBSs during primate evolution. CONCLUSION We charted the evolutionary landscapes of cCREs and TFBSs among placental mammals, identifying a subset of elements under purifying selection in the mammalian lineage. These elements are highly enriched in the human genetic variants associated with a panel of diverse, complex traits, with heritability enrichment contributed by both nucleotides under mammalian and nucleotides under primate constraint. Mammalian evolution of the human regulatory landscape. (A) Distribution of human cCREs by the number of genomes they align. (B) Projection of cCREs by alignments to the other 240 mammalian genomes. (C) Project of HNF4A sites (constrained, red; unconstrained, blue). (D) Heritability enrichment for 69 human traits in partitions of TFBSs ordered by evolutionary constraint. (E) Heritability enrichment for human traits by subsets of TFBSs.

7 citations


Journal ArticleDOI
Megan A. Supple, Ayshwarya Subramanian, Anish Mudide, Ross Swofford, Aitor Serres-Armero, Cynthia C. Steiner, Klaus-Peter Koepfli, Elinor K. Karlsson, Kerstin Lindblad-Toh, Tomas Marques-Bonet, Violeta Munoz Fuentes, Kathleen Foley, Oliver A. Ryder, Beth Shapiro, Gregory R. Andrews, Joel C. Armstrong, Matteo Bianchi, Bruce W. Birren, Kevin R. Bredemeyer, Ana M Breit, Matthew J. Christmas, Hiram Clawson, Joana Damas, Federica Di Palma, Mark Diekhans, Michael X. Dong, Eduardo Eizirik, Kaili Fan, Cornelia E. Fanter, Nicole M. Foley, Karin Forsberg-Nilsson, Carlos Jose de Armas Garcia, John Gatesy, Steven Gazal, Diane P. Genereux, Linda Goodman, Jenna R. Grimshaw, Michaela K. Halsey, Andrew J. Harris, Glenn Hickey, Michael Hiller, Allyson Hindle, Robert Hubley, Graham M. Hughes, Jeremy A. Johnson, David Juan, Irene M. Kaplow, Kathleen C. Keough, Bogdan M. Kirilenko, Jennifer M. Korstian, Amanda Kowalczyk, Sergey V. Kozyrev, Alyssa J. Lawler, Colleen Lawless, Thomas Lehmann, Daniel Lévesque, Harris A. Lewin, Xue Li, Abigail L. Lind, Ava Mackay-Smith, Voichita D. Marinescu, Victor C. Mason, Jennifer R. S. Meadows, Wynn K. Meyer, Jill Moore, Lucas R. Moreira, Diana D. Moreno-Santillán, Kathleen Morrill, Gerard Muntané, William J. Murphy, Arcadi Navarro, Martin T. Nweeia, Sylvia Ortmann, Austin B. Osmanski, Benedict Paten, Nicole Paulat, Andreas R. Pfenning, BaDoi N. Phan, Katherine S. Pollard, Henry Pratt, David A. Ray, Steven K. Reilly, Jeb Rosen, Irina Ruf, Louise Ryan, Pardis C. Sabeti, Daniel E. Schäffer, Aitor Serres, Arian F.A. Smit, Mark S. Springer, Chaitanya Srinivasan, Jessica M. Storer, Kevin A.M. Sullivan, Patrick F. Sullivan, Elisabeth Sundström, Joy-El R B Talbot, Emma C. Teeling, Jason Turner-Maier, Alejandro E. J. Valenzuela, Franziska Wagner, Ola Wallerman, Chao Wang, Juehan Wang, Zhiping Weng, Aryn P. Wilder, Morgan Wirthlin, James Xue, Xiaomeng Zhang 
28 Apr 2023-Science
TL;DR: In this article , the authors used the Zoonomia multispecies alignment to evaluate how historical effective population size (Ne) affects heterozygosity and deleterious genetic load and how these factors may contribute to extinction risk.
Abstract: Species persistence can be influenced by the amount, type, and distribution of diversity across the genome, suggesting a potential relationship between historical demography and resilience. In this study, we surveyed genetic variation across single genomes of 240 mammals that compose the Zoonomia alignment to evaluate how historical effective population size (Ne) affects heterozygosity and deleterious genetic load and how these factors may contribute to extinction risk. We find that species with smaller historical Ne carry a proportionally larger burden of deleterious alleles owing to long-term accumulation and fixation of genetic load and have a higher risk of extinction. This suggests that historical demography can inform contemporary resilience. Models that included genomic data were predictive of species’ conservation status, suggesting that, in the absence of adequate census or ecological data, genomic information may provide an initial risk assessment. Description INTRODUCTION The Anthropocene is marked by an accelerated loss of biodiversity, widespread population declines, and a global conservation crisis. Given limited resources for conservation intervention, an approach is needed to identify threatened species from among the thousands lacking adequate information for status assessments. Such prioritization for intervention could come from genome sequence data, as genomes contain information about demography, diversity, fitness, and adaptive potential. However, the relevance of genomic data for identifying at-risk species is uncertain, in part because genetic variation may reflect past events and life histories better than contemporary conservation status. RATIONALE The Zoonomia multispecies alignment presents an opportunity to systematically compare neutral and functional genomic diversity and their relationships to contemporary extinction risk across a large sample of diverse mammalian taxa. We surveyed 240 species spanning from the “Least Concern” to “Critically Endangered” categories, as published in the International Union for Conservation of Nature’s Red List of Threatened Species. Using a single genome for each species, we estimated historical effective population sizes (Ne) and distributions of genome-wide heterozygosity. To estimate genetic load, we identified substitutions relative to reconstructed ancestral sequences, assuming that mutations at evolutionarily conserved sites and in protein-coding sequences, especially in genes essential for viability in mice, are predominantly deleterious. We examined relationships between the conservation status of species and metrics of heterozygosity, demography, and genetic load and used these data to train and test models to distinguish threatened from nonthreatened species. RESULTS Species with smaller historical Ne are more likely to be categorized as at risk of extinction, suggesting that demography, even from periods more than 10,000 years in the past, may be informative of contemporary resilience. Species with smaller historical Ne also carry proportionally higher burdens of weakly and moderately deleterious alleles, consistent with theoretical expectations of the long-term accumulation and fixation of genetic load under strong genetic drift. We found weak support for a causative link between fixed drift load and extinction risk; however, other types of genetic load not captured in our data, such as rare, highly deleterious alleles, may also play a role. Although ecological (e.g., physiological, life-history, and behavioral) variables were the best predictors of extinction risk, genomic variables nonrandomly distinguished threatened from nonthreatened species in regression and machine learning models. These results suggest that information encoded within even a single genome can provide a risk assessment in the absence of adequate ecological or population census data. CONCLUSION Our analysis highlights the potential for genomic data to rapidly and inexpensively gauge extinction risk by leveraging relationships between contemporary conservation status and genetic variation shaped by the long-term demographic history of species. As more resequencing data and additional reference genomes become available, estimates of genetic load, estimates of recent demographic history, and accuracy of predictive models will improve. We therefore echo calls for including genomic information in assessments of the conservation status of species. Genomic information can help predict extinction risk in diverse mammalian species. Across 240 mammals, species with smaller historical Ne had lower genetic diversity, higher genetic load, and were more likely to be threatened with extinction. Genomic data were used to train models that predict whether a species is threatened, which can be valuable for assessing extinction risk in species lacking ecological or census data. [Animal silhouettes are from PhyloPic]

5 citations


Journal ArticleDOI
Nicole Paulat, Jenna R. Grimshaw, Diana D. Moreno-Santillán, Claudia Crookshanks, Jacquelyn Roberts, Carlos J. Garcia, Matthew G. Johnson, Llewellyn D. Densmore, Richard D. Stevens, Jeb Rosen, Jessica M. Storer, Arian F.A. Smit, Liliana M. Dávalos, Elinor K. Karlsson, David A. Ray, Gregory R. Andrews, Joel C. Armstrong, Matteo Bianchi, Bruce W. Birren, Kevin R. Bredemeyer, Ana M Breit, Matthew J. Christmas, Hiram Clawson, Joana Damas, Federica Di Palma, Mark Diekhans, Michael X. Dong, Eduardo Eizirik, Kaili Fan, Cornelia E. Fanter, Nicole M. Foley, Karin Forsberg-Nilsson, Carlos Jose de Armas Garcia, John Gatesy, Steven Gazal, Diane P. Genereux, Linda Goodman, Michaela K. Halsey, Andrew J. Harris, Glenn Hickey, Michael Hiller, Allyson Hindle, Robert Hubley, Graham M. Hughes, Jeremy A. Johnson, David Juan, Irene M. Kaplow, Kathleen C. Keough, Bogdan M. Kirilenko, Klaus-Peter Koepfli, Jennifer M. Korstian, Amanda Kowalczyk, Sergey V. Kozyrev, Alyssa J. Lawler, Colleen Lawless, Thomas Lehmann, Daniel Lévesque, Harris A. Lewin, Xue Li, Abigail L. Lind, Kerstin Lindblad-Toh, Ava Mackay-Smith, Voichita D. Marinescu, Tomas Marques-Bonet, Victor C. Mason, Jennifer R. S. Meadows, Wynn K. Meyer, Jill Moore, Lucas R. Moreira, Kathleen Morrill, Gerard Muntané, William J. Murphy, Arcadi Navarro, Martin T. Nweeia, Sylvia Ortmann, Austin B. Osmanski, Benedict Paten, Andreas R. Pfenning, BaDoi N. Phan, Katherine S. Pollard, Henry Pratt, Steven K. Reilly, Irina Ruf, Louise Ryan, Oliver A. Ryder, Pardis C. Sabeti, Daniel E. Schäffer, Aitor Serres, Beth Shapiro, Mark S. Springer, Chaitanya Srinivasan, Cynthia C. Steiner, Kevin A.M. Sullivan, Patrick F. Sullivan, Elisabeth Sundström, Megan A. Supple, Ross Swofford, Joy-El R B Talbot, Emma C. Teeling, Jason Turner-Maier, Alejandro Valenzuela, Franziska Wagner, Ola Wallerman, Chao Wang, Juehan Wang, Zhiping Weng, Aryn P. Wilder, Morgan Wirthlin, James Xue, Xiaomeng Zhang 
28 Apr 2023-Science
TL;DR: The authors of as discussed by the authors examined transposable element (TE) content of 248 mammalian genome assemblies and found that although mammals resemble one another in total TE content and diversity, they show substantial differences with regard to recent TE accumulation.
Abstract: We examined transposable element (TE) content of 248 placental mammal genome assemblies, the largest de novo TE curation effort in eukaryotes to date. We found that although mammals resemble one another in total TE content and diversity, they show substantial differences with regard to recent TE accumulation. This includes multiple recent expansion and quiescence events across the mammalian tree. Young TEs, particularly long interspersed elements, drive increases in genome size, whereas DNA transposons are associated with smaller genomes. Mammals tend to accumulate only a few types of TEs at any given time, with one TE type dominating. We also found association between dietary habit and the presence of DNA transposon invasions. These detailed annotations will serve as a benchmark for future comparative TE analyses among placental mammals. Description INTRODUCTION An estimated 160 million years have passed since the first placental mammals evolved. These eutherians are categorized into 19 orders consisting of nearly 4000 extant species, with ~70% being bats or rodents. Broad, in-depth, and comparative genomic studies across Eutheria have previously been unachievable because of the lack of genomic resources. The collaboration of the Zoonomia Consortium made available hundreds of high-quality genome assemblies for comparative analysis. Our focus within the consortium was to investigate the evolution of transposable elements (TEs) among placental mammals. Using these data, we identified previously known TEs, described previously unknown TEs, and analyzed the TE distribution among multiple taxonomic levels. RATIONALE The emergence of accurate and affordable sequencing technology has propelled efforts to sequence increasingly more nonmodel mammalian genomes in the past decade. Most of these efforts have traditionally focused on genic regions searching for patterns of selection or variation in gene regulation. The common trend of ignoring or trivializing TE annotation with newly published genomes has resulted in severe lag of TE analyses, leading to extensive undiscovered TE variation. This oversight has neglected an important source of evolution because the accumulation of TEs is attributable to drastic alterations in genome architecture, including insertions, deletions, duplications, translocations, and inversions. Our approach to the Zoonomia dataset was to provide future inquirers accurate and meticulous TE curations and to describe taxonomic variation among eutherians. RESULTS We annotated the TE content of 248 mammalian genome assemblies, which yielded a library of 25,676 consensus TE sequences, 8263 of which were previously unidentified TE sequences (available at https://dfam.org). We affirmed that the largest component of a typical mammalian genome is comprised of TEs (average 45.6%). Of the 248 assemblies, the lowest genomic percentage of TEs was found in the star-nosed mole (27.6%), and the largest percentage was seen in the aardvark (74.5%), whose increase in TE accumulation drove a corresponding increase in genome size—a correlation we observed across Eutheria. The overall genomic proportions of recently accumulated TEs were roughly similar across most mammals in the dataset, with a few notable exceptions (see the figure). Diversity of recently accumulated TEs is highest among multiple families of bats, mostly driven by substantial DNA transposon activity. Our data also exhibit an increase of recently accumulated DNA transposons among carnivore lineages over their herbivorous counterparts, which suggests that diet may play a role in determining the genomic content of TEs. CONCLUSION The copious TE data provided in this work emanated from the largest comprehensive TE curation effort to date. Considering the wide-ranging effects that TEs impose on genomic architecture, these data are an important resource for future inquiries into mammalian genomics and evolution and suggest avenues for continued study of these important yet understudied genomic denizens. Boxplots depicting the range of recently accumulated TEs among mammals (by proportion of genome). Five categories of TE were examined: DNA transposons, long interspersed elements (LINEs), long terminal repeat (LTR) retrotransposons, rolling circle (RC) transposons, and short interspersed elements (SINEs). Species with the highest and lowest proportions for each TE type are indicated by a picture of the organism and its common name. With regard to RC and DNA transposons, we found that most mammalian genome assemblies exhibit essentially zero recent accumulation (RC: 240 of 248 mammals had <0.1%; DNA: 210 of 248 mammals had <0.1%). ILLUSTRATIONS: BRITTANY ANN HALE

3 citations


Journal ArticleDOI
Katherine L. Moon, Heather J. Huson, Kathleen Morrill, Ming-Shan Wang, Xue Li, Krishnamoorthy Srikanth, Gavin J. Svenson, Gregory R. Andrews, Joel C. Armstrong, Matteo Bianchi, Bruce W. Birren, Kevin R. Bredemeyer, Ana M Breit, Matthew J. Christmas, Hiram Clawson, Joana Damas, Federica Di Palma, Mark Diekhans, Michael X. Dong, Eduardo Eizirik, Kaili Fan, Cornelia E. Fanter, Nicole M. Foley, Karin Forsberg-Nilsson, Carlos Jose de Armas Garcia, John Gatesy, Steven Gazal, Diane P. Genereux, Linda Goodman, Jenna R. Grimshaw, Michaela K. Halsey, Andrew J. Harris, Glenn Hickey, Michael Hiller, Allyson Hindle, Robert Hubley, Graham M. Hughes, Jeremy A. Johnson, David Juan, Irene M. Kaplow, Elinor K. Karlsson, Kathleen C. Keough, Bogdan M. Kirilenko, Klaus-Peter Koepfli, Jennifer M. Korstian, Amanda Kowalczyk, Sergey V. Kozyrev, Alyssa J. Lawler, Colleen Lawless, Thomas Lehmann, Daniel Lévesque, Harris A. Lewin, Xue Li, Abigail L. Lind, Kerstin Lindblad-Toh, Ava Mackay-Smith, Voichita D. Marinescu, Tomas Marques-Bonet, Victor C. Mason, Jennifer R. S. Meadows, Wynn K. Meyer, Jill Moore, Lucas R. Moreira, Diana D. Moreno-Santillán, Gerard Muntané, William J. Murphy, Arcadi Navarro, Martin T. Nweeia, Sylvia Ortmann, Austin B. Osmanski, Benedict Paten, Nicole Paulat, Andreas R. Pfenning, BaDoi N. Phan, Katherine S. Pollard, Henry Pratt, David A. Ray, Steven K. Reilly, Jeb Rosen, Irina Ruf, Louise Ryan, Oliver A. Ryder, Pardis C. Sabeti, Daniel E. Schäffer, Aitor Serres, Beth Shapiro, Arian F.A. Smit, Mark S. Springer, Chaitanya Srinivasan, Cynthia C. Steiner, Jessica M. Storer, Kevin A.M. Sullivan, Patrick F. Sullivan, Elisabeth Sundström, Megan A. Supple, Ross Swofford, Joy-El R B Talbot, Emma C. Teeling, Jason Turner-Maier, Alejandro A. Valenzuela, Franziska Wagner, Ola Wallerman, Chao Wang, Juehan Wang, Zhiping Weng, Aryn P. Wilder, Morgan Wirthlin, James Xue, Xiaomeng Zhang 
28 Apr 2023-Science
TL;DR: The authors reconstructed the phenotype of Balto, the heroic sled dog renowned for transporting diphtheria antitoxin to Nome, Alaska, in 1925, using evolutionary constraint estimates from the Zoonomia alignment of 240 mammals and 682 genomes from dogs and wolves of the 21st century.
Abstract: We reconstruct the phenotype of Balto, the heroic sled dog renowned for transporting diphtheria antitoxin to Nome, Alaska, in 1925, using evolutionary constraint estimates from the Zoonomia alignment of 240 mammals and 682 genomes from dogs and wolves of the 21st century. Balto shares just part of his diverse ancestry with the eponymous Siberian husky breed. Balto’s genotype predicts a combination of coat features atypical for modern sled dog breeds, and a slightly smaller stature. He had enhanced starch digestion compared with Greenland sled dogs and a compendium of derived homozygous coding variants at constrained positions in genes connected to bone and skin development. We propose that Balto’s population of origin, which was less inbred and genetically healthier than that of modern breeds, was adapted to the extreme environment of 1920s Alaska. Description INTRODUCTION It has been almost 100 years since the sled dog Balto helped save the community of Nome, Alaska, from a diphtheria outbreak. Today, Balto symbolizes the indomitable spirit of the sled dog. He is immortalized in statue and film, and is physically preserved and on display at the Cleveland Museum of Natural History. Balto represents a dog population that was reputed to tolerate harsh conditions at a time when northern communities were reliant on sled dogs. Investigating Balto’s genome sequence using technologies for sequencing degraded DNA offers a new perspective on this historic population. RATIONALE Analyzing high-coverage (40.4-fold) DNA sequencing data from Balto through comparison with large genomic data resources offers an opportunity to investigate genetic diversity and genome function. We leveraged the genome sequence data from 682 dogs, including both working sled dogs and dog breeds, as well as evolutionary constraint scores from the Zoonomia alignment of 240 mammals, to reconstruct Balto’s phenotype and investigate his ancestry and what might distinguish him from modern dogs. RESULTS Balto shares just part of his diverse ancestry with the eponymous Siberian husky breed and was more genetically diverse than both modern breeds and working sled dogs. Both Balto and working sled dogs had a lower burden of rare, potentially damaging variation than modern breeds and fewer potentially damaging variants, suggesting that they represent genetically healthier populations. We inferred Balto’s appearance on the basis of genomic variants known to shape physical characteristics in dogs today. We found that Balto had a combination of coat features atypical for modern sled dog breeds and a slightly smaller stature, inferences that are confirmed by comparison to historical photographs. Balto’s ability to digest starch was enhanced compared to wolves and Greenland sled dogs but reduced compared to modern breeds. He carried a compendium of derived homozygous coding variants at constrained positions in genes connected to bone and skin development, which may have conferred a functional advantage. CONCLUSION Balto belonged to a population of small, fast, and fit sled dogs imported from Siberia. By sequencing his genome from his taxidermied remains and analyzing these data in the context of large comparative and canine datasets, we show that Balto and his working sled dog contemporaries were more genetically diverse than modern breeds and may have carried variants that helped them survive the harsh conditions of 1920s Alaska. Although the era of Balto and his contemporaries has passed, comparative genomics, supported by a growing collection of modern and past genomes, can provide insights into the selective pressures that shaped them. Balto, famed 20th-century Alaskan sled dog, shares common ancestry with modern Asian and Arctic canine lineages. In an unsupervised admixture analysis, Balto’s ancestry, representing 20th-century Alaskan sled dogs, is assigned predominantly to four Arctic lineage dog populations. He had no discernable wolf ancestry. The Alaskan sled dogs (a working population) did not fall into a distinct ancestry cluster but shared about a third of their ancestry with Balto in the supervised admixture analysis. Balto and working sled dogs carried fewer constrained and missense rare variants than modern dog breeds. IMAGE CREDIT: K. MORRILL

1 citations