Author
Barbara Robbertse
Other affiliations: Oregon State University, Cornell University, Centre national de la recherche scientifique
Bio: Barbara Robbertse is an academic researcher from National Institutes of Health. The author has contributed to research in topics: Genome & RefSeq. The author has an hindex of 22, co-authored 29 publications receiving 7277 citations. Previous affiliations of Barbara Robbertse include Oregon State University & Cornell University.
Papers
More filters
••
TL;DR: The approach to utilizing available RNA-Seq and other data types in the authors' manual curation process for vertebrate, plant, and other species is summarized, and a new direction for prokaryotic genomes and protein name management is described.
Abstract: The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.
4,104 citations
••
University of New Mexico1, Los Alamos National Laboratory2, Novozymes3, University of Provence4, VTT Technical Research Centre of Finland5, Pacific Northwest National Laboratory6, Joint Genome Institute7, United States Department of Agriculture8, Vienna University of Technology9, Pontifical Catholic University of Chile10, Oregon State University11, Genencor12
TL;DR: This work assembled 89 scaffolds to generate 34 Mbp of nearly contiguous T. reesei genome sequence comprising 9,129 predicted gene models, providing a roadmap for constructing enhanced T.Reesei strains for industrial applications such as biofuel production.
Abstract: Trichoderma reesei is the main industrial source of cellulases and hemicellulases used to depolymerize biomass to simple sugars that are converted to chemical intermediates and biofuels, such as ethanol. We assembled 89 scaffolds (sets of ordered and oriented contigs) to generate 34 Mbp of nearly contiguous T. reesei genome sequence comprising 9,129 predicted gene models. Unexpectedly, considering the industrial utility and effectiveness of the carbohydrate-active enzymes of T. reesei, its genome encodes fewer cellulases and hemicellulases than any other sequenced fungus able to hydrolyze plant cell wall polysaccharides. Many T. reesei genes encoding carbohydrate-active enzymes are distributed nonrandomly in clusters that lie between regions of synteny with other Sordariomycetes. Numerous genes encoding biosynthetic pathways for secondary metabolites may promote survival of T. reesei in its competitive soil habitat, but genome analysis provided little mechanistic insight into its extraordinary capacity for protein secretion. Our analysis, coupled with the genome sequence data, provides a roadmap for constructing enhanced T. reesei strains for industrial applications such as biofuel production.
1,085 citations
••
TL;DR: The National Center for Biotechnology Information (NCBI) Taxonomy includes organism names and classifications for every sequence in the nucleotide and protein sequence databases of the International Nucleotide Sequence Database Collaboration.
Abstract: The National Center for Biotechnology Information (NCBI) Taxonomy includes organism names and classifications for every sequence in the nucleotide and protein sequence databases of the International Nucleotide Sequence Database Collaboration. Since the last review of this resource in 2012, it has undergone several improvements. Most notable is the shift from a single SQL database to a series of linked databases tied to a framework of data called NameBank. This means that relations among data elements can be adjusted in more detail, resulting in expanded annotation of synonyms, the ability to flag names with specific nomenclatural properties, enhanced tracking of publications tied to names and improved annotation of scientific authorities and types. Additionally, practices utilized by NCBI Taxonomy curators specific to major taxonomic groups are described, terms peculiar to NCBI Taxonomy are explained, external resources are acknowledged and updates to tools and other resources are documented. Database URL: https://www.ncbi.nlm.nih.gov/taxonomy.
685 citations
••
Oregon State University1, Yale University2, Duke University3, University of Tennessee4, Clark University5, Kaiserslautern University of Technology6, Centraalbureau voor Schimmelcultures7, University of Copenhagen8, University of Tabriz9, Harvard University10, University of Pretoria11, ATCC12, Louisiana State University13, University of Texas at Austin14, Aberystwyth University15, United States Department of Agriculture16, Field Museum of Natural History17, Pennsylvania State University18, University of California, Berkeley19, University of North Carolina at Chapel Hill20, Stellenbosch University21, Free University of Berlin22, Washington State University23, Brandon University24, Landcare Research25, University of Helsinki26, University of Giessen27, University of Nottingham28, Swedish Museum of Natural History29, Royal Botanic Garden Edinburgh30
TL;DR: A 6-gene, 420-species maximum-likelihood phylogeny of Ascomycota, the largest phylum of Fungi, and a phylogenetic informativeness analysis of all 6 genes and a series of ancestral character state reconstructions support a terrestrial, saprobic ecology as ancestral are presented.
Abstract: We present a 6-gene, 420-species maximum-likelihood phylogeny of Ascomycota, the largest phylum of Fungi. This analysis is the most taxonomically complete to date with species sampled from all 15 currently circumscribed classes. A number of superclass-level nodes that have previously evaded resolution and were unnamed in classifications of the Fungi are resolved for the first time. Based on the 6-gene phylogeny we conducted a phylogenetic informativeness analysis of all 6 genes and a series of ancestral character state reconstructions that focused on morphology of sporocarps, ascus dehiscence, and evolution of nutritional modes and ecologies. A gene-by-gene assessment of phylogenetic informativeness yielded higher levels of informativeness for protein genes (RPB1, RPB2, and TEF1) as compared with the ribosomal genes, which have been the standard bearer in fungal systematics. Our reconstruction of sporocarp characters is consistent with 2 origins for multicellular sexual reproductive structures in Ascomycota, once in the common ancestor of Pezizomycotina and once in the common ancestor of Neolectomycetes. This first report of dual origins of ascomycete sporocarps highlights the complicated nature of assessing homology of morphological traits across Fungi. Furthermore, ancestral reconstruction supports an open sporocarp with an exposed hymenium (apothecium) as the primitive morphology for Pezizomycotina with multiple derivations of the partially (perithecia) or completely enclosed (cleistothecia) sporocarps. Ascus dehiscence is most informative at the class level within Pezizomycotina with most superclass nodes reconstructed equivocally. Character-state reconstructions support a terrestrial, saprobic ecology as ancestral. In contrast to previous studies, these analyses support multiple origins of lichenization events with the loss of lichenization as less frequent and limited to terminal, closely related species.
592 citations
••
National Institutes of Health1, Kean University2, Murdoch University3, Agricultural Research Service4, University of Graz5, Hirosaki University6, Mae Fah Luang University7, Biotec8, University of North Carolina at Chapel Hill9, Uppsala University10, Masaryk University11, DePaul University12, Oregon State University13, Illinois Natural History Survey14, University of Illinois at Chicago15, University of Chicago16, University of Minnesota17, Universidade Nova de Lisboa18, Prince of Songkla University19, University of Hong Kong20, Blaise Pascal University21, University of Illinois at Urbana–Champaign22, Technical University of Madrid23, Tuscia University24, Tottori University25, University of Pretoria26, Stellenbosch University27
TL;DR: A genomic comparison of 6 dothideomycete genomes with other fungi finds a high level of unique protein associated with the class, supporting its delineation as a separate taxon.
507 citations
Cited by
More filters
••
TL;DR: The Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available, providing a unified solution for transcriptome reconstruction in any sample.
Abstract: Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here we present the Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available. By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes. Compared with other de novo transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels, with a sensitivity similar to methods that rely on genome alignments. Our approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome.
15,665 citations
••
Broad Institute1, Commonwealth Scientific and Industrial Research Organisation2, Massachusetts Institute of Technology3, Hebrew University of Jerusalem4, Science for Life Laboratory5, Pittsburgh Supercomputing Center6, Oklahoma State University–Stillwater7, Griffith University8, University of Wisconsin-Madison9, Dresden University of Technology10, California Institute for Quantitative Biosciences11, Flanders Institute for Biotechnology12, Parco Tecnologico Padano13, United States Department of Agriculture14, Purdue University15, Indiana University16
TL;DR: This protocol provides a workflow for genome-independent transcriptome analysis leveraging the Trinity platform and presents Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes.
Abstract: De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h.
6,369 citations
••
TL;DR: The content has been expanded and the quality improved irrespective of whether or not the KOs appear in the three molecular network databases, and the newly introduced addendum category of the GENES database is a collection of individual proteins whose functions are experimentally characterized and from which an increasing number of KOs are defined.
Abstract: KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an encyclopedia of genes and genomes. Assigning functional meanings to genes and genomes both at the molecular and higher levels is the primary objective of the KEGG database project. Molecular-level functions are stored in the KO (KEGG Orthology) database, where each KO is defined as a functional ortholog of genes and proteins. Higher-level functions are represented by networks of molecular interactions, reactions and relations in the forms of KEGG pathway maps, BRITE hierarchies and KEGG modules. In the past the KO database was developed for the purpose of defining nodes of molecular networks, but now the content has been expanded and the quality improved irrespective of whether or not the KOs appear in the three molecular network databases. The newly introduced addendum category of the GENES database is a collection of individual proteins whose functions are experimentally characterized and from which an increasing number of KOs are defined. Furthermore, the DISEASE and DRUG databases have been improved by systematic analysis of drug labels for better integration of diseases and drugs with the KEGG molecular networks. KEGG is moving towards becoming a comprehensive knowledge base for both functional interpretation and practical application of genomic information.
5,741 citations
••
Conrad L. Schoch1, Keith A. Seifert, Sabine M. Huhndorf2, Vincent Robert3 +157 more•Institutions (59)
TL;DR: Among the regions of the ribosomal cistron, the internal transcribed spacer (ITS) region has the highest probability of successful identification for the broadest range of fungi, with the most clearly defined barcode gap between inter- and intraspecific variation.
Abstract: Six DNA regions were evaluated as potential DNA barcodes for Fungi, the second largest kingdom of eukaryotic life, by a multinational, multilaboratory consortium. The region of the mitochondrial cytochrome c oxidase subunit 1 used as the animal barcode was excluded as a potential marker, because it is difficult to amplify in fungi, often includes large introns, and can be insufficiently variable. Three subunits from the nuclear ribosomal RNA cistron were compared together with regions of three representative protein-coding genes (largest subunit of RNA polymerase II, second largest subunit of RNA polymerase II, and minichromosome maintenance protein). Although the protein-coding gene regions often had a higher percent of correct identification compared with ribosomal markers, low PCR amplification and sequencing success eliminated them as candidates for a universal fungal barcode. Among the regions of the ribosomal cistron, the internal transcribed spacer (ITS) region has the highest probability of successful identification for the broadest range of fungi, with the most clearly defined barcode gap between inter- and intraspecific variation. The nuclear ribosomal large subunit, a popular phylogenetic marker in certain groups, had superior species resolution in some taxonomic groups, such as the early diverging lineages and the ascomycete yeasts, but was otherwise slightly inferior to the ITS. The nuclear ribosomal small subunit has poor species-level resolution in fungi. ITS will be formally proposed for adoption as the primary fungal barcode marker to the Consortium for the Barcode of Life, with the possibility that supplementary barcodes may be developed for particular narrowly circumscribed taxonomic groups.
4,116 citations
••
TL;DR: The approach to utilizing available RNA-Seq and other data types in the authors' manual curation process for vertebrate, plant, and other species is summarized, and a new direction for prokaryotic genomes and protein name management is described.
Abstract: The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.
4,104 citations