Author
Jason T. Howard
Other affiliations: University of North Carolina at Chapel Hill, Duke University, Metropolitan University ...read more
Bio: Jason T. Howard is an academic researcher from Rockefeller University. The author has contributed to research in topics: Genomics & Genome. The author has an hindex of 22, co-authored 35 publications receiving 7093 citations. Previous affiliations of Jason T. Howard include University of North Carolina at Chapel Hill & Duke University.
Papers
More filters
••
Duke University1, University of Texas at Austin2, Heidelberg Institute for Theoretical Studies3, American Museum of Natural History4, Beijing Genomics Institute5, Xi'an Jiaotong University6, New Mexico State University7, University of Sydney8, University of California9, Uppsala University10, University of Copenhagen11, Okinawa Institute of Science and Technology12, University of Georgia13, Griffith University14, Catalan Institution for Research and Advanced Studies15, Joint Institute for Nuclear Research16, Oak Ridge National Laboratory17, Aarhus University18, Washington University in St. Louis19, University of California, Santa Cruz20, Cardiff University21, Kunming Institute of Zoology22, China Agricultural University23, Louisiana State University24, Tulane University25, Copenhagen Zoo26, Oregon Health & Science University27, Federal University of Pará28, Technical University of Denmark29, Canterbury Museum30, Curtin University31, Novosibirsk State University32, Smithsonian Institution33, National University of Singapore34, National Museum of Natural History35, Nova Southeastern University36, Occidental College37, University of Edinburgh38, Harvard University39, University of California, San Francisco40, University of Florida41, University of Illinois at Urbana–Champaign42
TL;DR: A genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves recovered a highly resolved tree that confirms previously controversial sister or close relationships and identifies the first divergence in Neoaves, two groups the authors named Passerea and Columbea.
Abstract: To better determine the history of modern birds, we performed a genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves using phylogenomic methods created to handle genome-scale data. We recovered a highly resolved tree that confirms previously controversial sister or close relationships. We identified the first divergence in Neoaves, two groups we named Passerea and Columbea, representing independent lineages of diverse and convergently evolved land and water bird species. Among Passerea, we infer the common ancestor of core landbirds to have been an apex predator and confirm independent gains of vocal learning. Among Columbea, we identify pigeons and flamingoes as belonging to sister clades. Even with whole genomes, some of the earliest branches in Neoaves proved challenging to resolve, which was best explained by massive protein-coding sequence convergence and high levels of incomplete lineage sorting that occurred during a rapid radiation after the Cretaceous-Paleogene mass extinction event about 66 million years ago.
1,624 citations
••
TL;DR: This work introduces a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences, leading to substantially better assemblies than current sequencing strategies.
Abstract: Single-molecule sequencing instruments can generate multikilobase sequences with the potential to greatly improve genome and transcriptome assembly. However, the error rates of single-molecule reads are high, which has limited their use thus far to resequencing bacteria. To address this limitation, we introduce a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on reads generated by a PacBio RS instrument from phage, prokaryotic and eukaryotic whole genomes, including the previously unsequenced genome of the parrot Melopsittacus undulatus, as well as for RNA-Seq reads of the corn (Zea mays) transcriptome. Our long-read correction achieves >99.9% base-call accuracy, leading to substantially better assemblies than current sequencing strategies: in the best example, the median contig size was quintupled relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly.
987 citations
••
Beijing Genomics Institute1, University of Copenhagen2, Royal Veterinary College3, Seoul National University4, University of Nebraska–Lincoln5, University of Porto6, University of South Carolina7, Montclair State University8, Uppsala University9, National University of Singapore10, University of California, Berkeley11, South China University of Technology12, Chinese Academy of Sciences13, Kunming Institute of Zoology14, Howard Hughes Medical Institute15, Aberystwyth University16, University of Kent17, University of California, Riverside18, Mississippi State University19, Austral University of Chile20, Swedish University of Agricultural Sciences21, China Agricultural University22, Cardiff University23, Copenhagen Zoo24, Louisiana State University25, Washington University in St. Louis26, Xi'an Jiaotong University27, University of California, Santa Cruz28, Nova Southeastern University Oceanographic Center29, Smithsonian Conservation Biology Institute30, National Museum of Natural History31, Natural History Museum32, University of California, San Francisco33, Harvard University34, University of Florida35, University of Edinburgh36, New Mexico State University37, Macau University of Science and Technology38, Curtin University39
TL;DR: This work explored bird macroevolution using full genomes from 48 avian species representing all major extant clades to reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits.
Abstract: Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits.
872 citations
••
Washington University in St. Louis1, University of Illinois at Urbana–Champaign2, Uppsala University3, University of California, Los Angeles4, Wellcome Trust Sanger Institute5, University of Oxford6, Duke University7, University of Houston8, University of Kent9, University of Oviedo10, Weizmann Institute of Science11, Institute for Systems Biology12, Louisiana State University13, University of Colorado Denver14, University of Washington15, University of Sheffield16, University of Edinburgh17, Max Planck Society18, Free University of Berlin19, Harvard University20, Monsanto21
TL;DR: This work shows that song behaviour engages gene regulatory networks in the zebra finch brain, altering the expression of long non-coding RNAs, microRNAs, transcription factors and their targets and shows evidence for rapid molecular evolution in the songbird lineage of genes that are regulated during song experience.
Abstract: The zebra finch is an important model organism in several fields with unique relevance to human neuroscience. Like other songbirds, the zebra finch communicates through learned vocalizations, an ability otherwise documented only in humans and a few other animals and lacking in the chicken-the only bird with a sequenced genome until now. Here we present a structural, functional and comparative analysis of the genome sequence of the zebra finch (Taeniopygia guttata), which is a songbird belonging to the large avian order Passeriformes. We find that the overall structures of the genomes are similar in zebra finch and chicken, but they differ in many intrachromosomal rearrangements, lineage-specific gene family expansions, the number of long-terminal-repeat-based retrotransposons, and mechanisms of sex chromosome dosage compensation. We show that song behaviour engages gene regulatory networks in the zebra finch brain, altering the expression of long non-coding RNAs, microRNAs, transcription factors and their targets. We also show evidence for rapid molecular evolution in the songbird lineage of genes that are regulated during song experience. These results indicate an active involvement of the genome in neural processes underlying vocal communication and identify potential genetic substrates for the evolution and regulation of this behaviour.
837 citations
••
University of California, Davis1, Yale University2, Laval University3, Joint Genome Institute4, Centre national de la recherche scientifique5, École normale supérieure de Cachan6, Wayne State University7, University of Georgia8, University of Udine9, Wellcome Trust Sanger Institute10, University of California, Santa Cruz11, University of Notre Dame12, European Bioinformatics Institute13, Duke University14, Baylor College of Medicine15, Broad Institute16, University of Washington17, University of Maryland, College Park18, University of California, Berkeley19, University of Lisbon20, Howard Hughes Medical Institute21, University of California, San Francisco22, Cold Spring Harbor Laboratory23, Royal Institute of Technology24
TL;DR: The Assemblathon 2 as mentioned in this paper presented a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and a snake) from 21 participating teams.
Abstract: Background - The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. Results - In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. Conclusions - Many current genome assemblers produced useful assemblies, containing a significant representation of their genes, regulatory sequences, and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.
690 citations
Cited by
More filters
••
TL;DR: FeatureCounts as discussed by the authors is a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments, which implements highly efficient chromosome hashing and feature blocking techniques.
Abstract: MOTIVATION: Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. RESULTS: We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. AVAILABILITY AND IMPLEMENTATION: featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.
14,103 citations
••
TL;DR: Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences, is presented, demonstrating that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences or Oxford Nanopore technologies.
Abstract: Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on tf-idf weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences (PacBio) or Oxford Nanopore technologies and achieves a contig NG50 of >21 Mbp on both human and Drosophila melanogaster PacBio data sets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes.
4,806 citations
••
TL;DR: The approach to utilizing available RNA-Seq and other data types in the authors' manual curation process for vertebrate, plant, and other species is summarized, and a new direction for prokaryotic genomes and protein name management is described.
Abstract: The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.
4,104 citations
••
TL;DR: This work presents a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing.
Abstract: We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.
3,647 citations
••
TL;DR: PartitionFinder 2 is a program for automatically selecting best-fit partitioning schemes and models of evolution for phylogenetic analyses that includes the ability to analyze morphological datasets, new methods to analyze genome-scale datasets, and new output formats to facilitate interoperability with downstream software.
Abstract: PartitionFinder 2 is a program for automatically selecting best-fit partitioning schemes and models of evolution for phylogenetic analyses. PartitionFinder 2 is substantially faster and more efficient than version 1, and incorporates many new methods and features. These include the ability to analyze morphological datasets, new methods to analyze genome-scale datasets, new output formats to facilitate interoperability with downstream software, and many new models of molecular evolution. PartitionFinder 2 is freely available under an open source license and works on Windows, OSX, and Linux operating systems. It can be downloaded from www.robertlanfear.com/partitionfinder. The source code is available at https://github.com/brettc/partitionfinder.
3,445 citations