Author
Nina Thayer
Other affiliations: Joint Genome Institute
Bio: Nina Thayer is an academic researcher from Los Alamos National Laboratory. The author has contributed to research in topics: Bacillus thuringiensis & Chromosome 16. The author has an hindex of 11, co-authored 14 publications receiving 2274 citations. Previous affiliations of Nina Thayer include Joint Genome Institute.
Papers
More filters
••
Los Alamos National Laboratory1, University of New Mexico2, Novozymes3, University of Provence4, VTT Technical Research Centre of Finland5, Pacific Northwest National Laboratory6, Joint Genome Institute7, United States Department of Agriculture8, Vienna University of Technology9, Pontifical Catholic University of Chile10, Oregon State University11, Genencor12
TL;DR: This work assembled 89 scaffolds to generate 34 Mbp of nearly contiguous T. reesei genome sequence comprising 9,129 predicted gene models, providing a roadmap for constructing enhanced T.Reesei strains for industrial applications such as biofuel production.
Abstract: Trichoderma reesei is the main industrial source of cellulases and hemicellulases used to depolymerize biomass to simple sugars that are converted to chemical intermediates and biofuels, such as ethanol. We assembled 89 scaffolds (sets of ordered and oriented contigs) to generate 34 Mbp of nearly contiguous T. reesei genome sequence comprising 9,129 predicted gene models. Unexpectedly, considering the industrial utility and effectiveness of the carbohydrate-active enzymes of T. reesei, its genome encodes fewer cellulases and hemicellulases than any other sequenced fungus able to hydrolyze plant cell wall polysaccharides. Many T. reesei genes encoding carbohydrate-active enzymes are distributed nonrandomly in clusters that lie between regions of synteny with other Sordariomycetes. Numerous genes encoding biosynthetic pathways for secondary metabolites may promote survival of T. reesei in its competitive soil habitat, but genome analysis provided little mechanistic insight into its extraordinary capacity for protein secretion. Our analysis, coupled with the genome sequence data, provides a roadmap for constructing enhanced T. reesei strains for industrial applications such as biofuel production.
1,085 citations
••
TL;DR: Comparative analyses show a fascinating picture of conservation and divergence, revealing large blocks of gene orthology with rodents, scattered regions with more recent gene family expansions and deletions, and segments of coding and non-coding conservation with the distant fish species Takifugu.
Abstract: Chromosome 19 has the highest gene density of all human chromosomes, more than double the genome-wide average. The large clustered gene families, corresponding high G + C content, CpG islands and density of repetitive DNA indicate a chromosome rich in biological and evolutionary significance. Here we describe 55.8 million base pairs of highly accurate finished sequence representing 99.9% of the euchromatin portion of the chromosome. Manual curation of gene loci reveals 1,461 protein-coding genes and 321 pseudogenes. Among these are genes directly implicated in mendelian disorders, including familial hypercholesterolaemia and insulin-resistant diabetes. Nearly one-quarter of these genes belong to tandemly arranged families, encompassing more than 25% of the chromosome. Comparative analyses show a fascinating picture of conservation and divergence, revealing large blocks of gene orthology with rodents, scattered regions with more recent gene family expansions and deletions, and segments of coding and non-coding conservation with the distant fish species Takifugu.
307 citations
••
TL;DR: Comparison of the genomes of two members of the B. cereus group revealed differences in terms of virulence, metabolic competence, structural components, and regulatory mechanisms, as well as shared and unique genes among these isolates in comparison to the genome of pathogenic strains B. anthracis Ames and B. cerealus.
Abstract: Bacillus anthracis, Bacillus cereus, and Bacillus thuringiensis are closely related gram-positive, spore-forming bacteria of the B. cereus sensu lato group. While independently derived strains of B. anthracis reveal conspicuous sequence homogeneity, environmental isolates of B. cereus and B. thuringiensis exhibit extensive genetic diversity. Here we report the sequencing and comparative analysis of the genomes of two members of the B. cereus group, B. thuringiensis 97-27 subsp. konkukian serotype H34, isolated from a necrotic human wound, and B. cereus E33L, which was isolated from a swab of a zebra carcass in Namibia. These two strains, when analyzed by amplified fragment length polymorphism within a collection of over 300 of B. cereus, B. thuringiensis, and B. anthracis isolates, appear closely related to B. anthracis. The B. cereus E33L isolate appears to be the nearest relative to B. anthracis identified thus far. Whole-genome sequencing of B. thuringiensis 97-27and B. cereus E33L was undertaken to identify shared and unique genes among these isolates in comparison to the genomes of pathogenic strains B. anthracis Ames and B. cereus G9241 and nonpathogenic strains B. cereus ATCC 10987 and B. cereus ATCC 14579. Comparison of these genomes revealed differences in terms of virulence, metabolic competence, structural components, and regulatory mechanisms.
231 citations
••
TL;DR: The complete DNA sequence of the aerobic cellulolytic soil bacterium Cytophaga hutchinsonii, which belongs to the phylum Bacteroidetes, is presented and many genes thought to encode proteins involved in cellulose utilization were identified.
Abstract: The complete DNA sequence of the aerobic cellulolytic soil bacterium Cytophaga hutchinsonii, which belongs to the phylum Bacteroidetes, is presented. The genome consists of a single, circular, 4.43-Mb chromosome containing 3,790 open reading frames, 1,986 of which have been assigned a tentative function. Two of the most striking characteristics of C. hutchinsonii are its rapid gliding motility over surfaces and its contact-dependent digestion of crystalline cellulose. The mechanism of C. hutchinsonii motility is not known, but its genome contains homologs for each of the gld genes that are required for gliding of the distantly related bacteroidete Flavobacterium johnsoniae. Cytophaga-Flavobacterium gliding appears to be novel and does not involve well-studied motility organelles such as flagella or type IV pili. Many genes thought to encode proteins involved in cellulose utilization were identified. These include candidate endo-β-1,4-glucanases and β-glucosidases. Surprisingly, obvious homologs of known cellobiohydrolases were not detected. Since such enzymes are needed for efficient cellulose digestion by well-studied cellulolytic bacteria, C. hutchinsonii either has novel cellobiohydrolases or has an unusual method of cellulose utilization. Genes encoding proteins with cohesin domains, which are characteristic of cellulosomes, were absent, but many proteins predicted to be involved in polysaccharide utilization had putative D5 domains, which are thought to be involved in anchoring proteins to the cell surface.
220 citations
••
TL;DR: The 78,884,754 base pairs of finished chromosome 16 sequence, representing over 99.9% of its euchromatin, revealed 880 protein-coding genes, including metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukaemia.
Abstract: Human chromosome 16 features one of the highest levels of segmentally duplicated sequence among the human autosomes. We report here the 78,884,754 base pairs of finished chromosome 16 sequence, representing over 99.9% of its euchromatin. Manual annotation revealed 880 protein-coding genes confirmed by 1,670 aligned transcripts, 19 transfer RNA genes, 341 pseudogenes and three RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukaemia. Several large-scale structural polymorphisms spanning hundreds of kilobase pairs were identified and result in gene content differences among humans. Whereas the segmental duplications of chromosome 16 are enriched in the relatively gene-poor pericentromere of the p arm, some are involved in recent gene duplication and conversion events that are likely to have had an impact on the evolution of primates and human disease susceptibility.
146 citations
Cited by
More filters
••
TL;DR: The current human genome sequence (Build 35) as discussed by the authors contains 2.85 billion nucleotides interrupted by only 341 gaps and is accurate to an error rate of approximately 1 event per 100,000 bases.
Abstract: The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers approximately 99% of the euchromatic genome and is accurate to an error rate of approximately 1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human genome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead.
3,989 citations
••
TL;DR: A draft genome sequence of the red jungle fowl, Gallus gallus, provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes.
Abstract: We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.
2,579 citations
••
TL;DR: Rapidly accumulating evidence indicates that structural variants can comprise millions of nucleotides of heterogeneity within every genome, and are likely to make an important contribution to human diversity and disease susceptibility.
Abstract: The first wave of information from the analysis of the human genome revealed SNPs to be the main source of genetic and phenotypic human variation. However, the advent of genome-scanning technologies has now uncovered an unexpectedly large extent of what we term 'structural variation' in the human genome. This comprises microscopic and, more commonly, submicroscopic variants, which include deletions, duplications and large-scale copy-number variants - collectively termed copy-number variants or copy-number polymorphisms - as well as insertions, inversions and translocations. Rapidly accumulating evidence indicates that structural variants can comprise millions of nucleotides of heterogeneity within every genome, and are likely to make an important contribution to human diversity and disease susceptibility.
1,804 citations
••
TL;DR: An analysis of 1,391 manually curated sequence-specific DNA-binding transcription factors, their functions, genomic organization and evolutionary conservation provides a solid foundation for future investigations to elucidate regulatory mechanisms underlying diverse mammalian biological processes.
Abstract: Transcription factors are key cellular components that control gene expression: their activities determine how cells function and respond to the environment. Currently, there is great interest in research into human transcriptional regulation. However, surprisingly little is known about these regulators themselves. For example, how many transcription factors does the human genome contain? How are they expressed in different tissues? Are they evolutionarily conserved? Here, we present an analysis of 1,391 manually curated sequence-specific DNA-binding transcription factors, their functions, genomic organization and evolutionary conservation. Much remains to be explored, but this study provides a solid foundation for future investigations to elucidate regulatory mechanisms underlying diverse mammalian biological processes.
1,489 citations
••
TL;DR: An increased understanding of the disorder's underlying genetic, molecular, and cellular mechanisms and a better appreciation of its progression and systemic manifestations have laid out the foundation for the development of clinical trials and potentially effective treatments.
1,319 citations